PCA as regression

A way to think about principal component analysis is as a matrix approximation. We have a matrix X_{T \times P} and we want to get a ‘smaller’ matrix Z_{T \times K} with K<p. We want the new ‘smaller’ matrix to be close to the original despite its reduced dimension. Sometimes we say ‘such that Z capture the bulk of comovement in X. Big data technology is such that nowadays the number of cross sectional units (number of columns in X) P has grown to be very large compared to the sixties say. Now, with ‘google maps would like to use your current location’ and future ‘google fridge would like to access your amazon shopping list’, you can count on P growing exponentially, we are just getting started. A lot of effort goes into this line of research, and with great leaps.
Continue reading

On the nonfarm payroll number

The total nonfarm payroll accounts for approximately 80% of the workers who produce the GDP of the United States. Despite the widely acknowledged fact that the Nonfarm payroll is highly volatile and is heavily revised, it is still driving both bonds and equity market moves before- and after it is published. The recent number came at a weak 142K compared with around 200K average over the past 12M. What we wish we would know now, but will only know later, is whether this number is a start of a weaker expansion in the workforce, or not.
Despite the fact that it is definitely on the weak side (as you can see in the top panel of the figure), it is nothing unusual (as you can see in the bottom panel of the figure).

The bottom panel charts the interval you have before the number is publish (forecast intervals) from a simple AR(1) model without imposing normality. The blue and the red lines are 1 and 2 standard deviations respectively. The recent number barely scratches the bottom blue, so nothing to suggest a significant shift from a healthy 200K. On the other hand, there is some persistence:

?Download as.txt
ar.ols(x = na.omit(nfp))
      1        2        3        4        5        6  
 0.2633   0.2672   0.1402   0.0841   0.1015  -0.0853  
Intercept: 0.318 (5.906) 
Order selected 6  sigma^2 estimated as  31430

So, on average we can expect to trend lower.

Code for figure:

?Download as.txt
tempenv <- new.env() 
# Bring it to global env
time <- index(tempenv$PAYEMS)
nfp <- as.numeric(diff(tempenv$PAYEMS))
par(mfrow = c(2,1))
k = 24
plott(tail(nfp,k),tail(time,k),return.to.default = F,main="NFP-changes")
nfpsd <- FCIplot(nfp,k=k,rrr1="Rol",rrr2="Rol",main="NFP-changes; forecast intervals superimposed")

Eplot (1)

Package Eplot on cran.

Easily convert your chart from this:
to this:

When you don’t have time to do Granger causality testing. You can do this:



?Download as.txt
library(Eplot) # I will be extending this packagage
dat = as.matrix(read.table(file = "https://dl.dropbox.com/u/9409065/UnemployementData.txt",header=T) )
plot(dat[,1], main = "Unemployment Rate", xlab = "Time (1960 - 2012, Monthly)")
plott(dat[,1], main = "Unemployment Rate", xlab = "Time (1960 - 2012, Monthly)",return.to.default=TRUE)
legend('topleft',c("Unemployment Rate (LHS)","Short Rate (RHS)"),col=1:2,pch=19,lty=1,bty="n",text.col=1:2)
# Comments welcome

Bootstrap Critisim (example)

In a previous post I underlined an inherent feature of the non-parametric Bootstrap, it’s heavy reliance on the (single) realization of the data. This feature is not a bad one per se, we just need to be aware of the limitations. From comments made on the other post regarding this, I gathered that a more concrete example can help push this point across.
Continue reading

Detecting bubbles in real time

Recently, we hear a lot about a housing bubble forming in UK. Would be great if we would have a formal test for identifying a bubble evolving in real time, I am not familiar with any such test. However, we can still do something in order to help us gauge if what we are seeing is indeed a bubbly process, which is bound to end badly.
Continue reading

Comments on Comments in R

When you are busy with a lengthy project, like writing a paper, you create many objects along the way. Every time you log into the project, you need to remember what is what. In the past, each new working session I used to rerun the script anew and follow what each line is doing until I get back the objects I need and continue working. Apart from helping you remember what you are doing, it is very useful for reproducibility, at least given your data, in the sense that you are sure nothing is overrun using the console and it is all there. Those days are over.
Continue reading

Omitted Variable Bias

Frequently, we see the term ‘control variables’. The researcher introduces dozens of explanatory variables she has no interest in. This is done in order to avoid the so-called ‘Omitted Variable Bias’.
In general, OLS estimator has great properties, not the least important is the fact that for a finite number of observations you can faithfully retrieve the marginal effect of X on Y, that is E(\widehat{\beta}) = \beta. This is very much not the case when you have a variable that should be included in the model but is left out. As in my previous posts about Multicollinearity and heteroskedasticity, I only try to provide the intuition since you are probably familiar with the result itself.
Continue reading