PCA as regression

A way to think about principal component analysis is as a matrix approximation. We have a matrix X_{T \times P} and we want to get a ‘smaller’ matrix Z_{T \times K} with K<P. We want the new ‘smaller’ matrix to be close to the original despite its reduced dimension. Sometimes we say ‘such that Z capture the bulk of comovement in X. Big data technology is such that nowadays the number of cross sectional units (number of columns in X) P has grown to be very large compared to the sixties say. Now, with ‘google maps would like to use your current location’ and future ‘google fridge would like to access your amazon shopping list’, you can count on P growing exponentially, we are just getting started. A lot of effort goes into this line of research, and with great leaps.

In OLS, the objective is to minimize \sum _{i = 1}^P \Vert Y_i - X \beta_i  \Vert ^2 = \sum _{i = 1}^P \Vert Y_i - \widehat{Y}_i \Vert ^2. In our case we have no fitted values since we have no Y, we want to approximate X to itself in a smart way. We create “\widehat{Y}” and call those the factors. That is also the reason for identifiably issues, and we always write that some more restrictions are needed. When you have only the X, this guy: \Vert Y_i - X \beta_i  \Vert ^2 can be created using many combinations of Y and the coefficients, you can’t determine both in a unique manner.

Most common way to establish the factors is via spectral decomposition. Lets construct the factors for some basic ETF’s. Easily in R :

loadings
In general, we should not do this using non-stationary series, but for illustration it is better. The bold line is the first column from the approximated matrix
(Z_{1:T,1}). As you can see from the loadings, it is pretty much the average of the other four columns which is typical. To better understand what R is doing we can do the decomposition ourselves. Formally, the loadings are the eigenvectors corresponding to the K largest eigenvalues of the (X^ \prime X)_{P \times P}, which is the covariance (or correlation when scaled) of X up to a constant.

Coming back to the OLS way of thinking, naturally, these loadings are the same as what we get if we project the factors onto the X space:

This regression-like style is now becoming more and more relevant. When P is very large relative to T, the procedure is not stable. Following this line of thinking, we can utilize all the tools from regression literature, i.e. ridge regression and LASSO, to stabilize the loading matrix using shrinkage.

6 comments on “PCA as regression”

    1. PCA Regression (PCR) is different. You first ‘condense’ the information in your X matrix using PCA and use the first few (say first 5) principal component as regressors.

Leave a Reply

Your email address will not be published. Required fields are marked *