In a previous post on this subject, we related the loadings of the principal components (PC’s) from the singular value decomposition (SVD) to regression coefficients of the PC’s onto the X matrix. This is normal given the fact that the factors are supposed to condense the information in X, and what better way to do that than to minimize the sum of squares between a linear combination of X (the factors) to the X matrix itself. A reader was asking where does principal component regression (PCR) enter. Here we relate the PCR to the usual OLS.
Tag: Statistics
Quasi-Maximum Likelihood (QML) beauty
Beauty.. really? well, beauty is in the eye of the beholder.
Linking backtesting with multiple testing
The other day, Harvey Campbell from Duke University gave a talk where I work. The talk- bearing the exciting name “Backtesting” was based on a paper by the same name.
The authors tackle the important problem of data-snooping; we need to account for the fact that we conducted many trials until we found a strategy (or a variable) that ‘works’. Accessible explanations can be found here and here. In this day and age, the ‘story’ behind what you are doing is more important than ever, given the things you can do using your desktop/laptop.
Advances in post-model-selection inference (2)
In the previous post we reviewed a way to handle the problem of inference after model selection. I recently read another related paper which goes about this complicated issues from a different angle. The paper titled ‘A significance test for the lasso’ is a real step forward in this area. The authors develop the asymptotic distribution for the coefficients, accounting for the selection step. A description of the tough problem they successfully tackle can be found here.
The usual way to test if variable (say variable j) adds value to your regression is using the F-test. We once compute the regression excluding variable j, and once including variable j. Then we compare the sum of squared errors and we know what is the distribution of the statistic, it is F, or , depends on your initial assumptions, so F-test or
-test. These are by far the most common tests to check if a variable should or should not be included. Problem arises if you search for variable j beforehand.
Bias vs. Consistency
Especially for undergraduate students but not just, the concepts of unbiasedness and consistency as well as the relation between these two are tough to get one’s head around. My aim here is to help with this. We start with a short explanation of the two concepts and follow with an illustration.
Bootstrap Critisim (example)
In a previous post I underlined an inherent feature of the non-parametric Bootstrap, it’s heavy reliance on the (single) realization of the data. This feature is not a bad one per se, we just need to be aware of the limitations. From comments made on the other post regarding this, I gathered that a more concrete example can help push this point across.
My favourite statistician
We are all standing on the shoulders of giants. Bradley Efron is one such giant. With the invention of the bootstrap in 1979 and later with his very influential 2004 paper about the Least Angle Regression (and the accompanied software written in R).
How Important is Variable Selection?
Very.
If you have 10 possible independent regressors, and none of which matter, you have a good chance to find at least one is important.
Quantile Autoregression in R
In the past, I wrote about robust regression. This is an important tool which handles outliers in the data. Roger Koenker is a substantial contributor in this area. His website is full of useful information and code so visit when you have time for it. The paper which drew my attention is “Quantile Autoregression” found under his research tab, it is a significant extension to the time series domain. Here you will find short demonstration for stuff you can do with quantile autoregression in R.
A Simple Model for Realized Volatility
The post has two goals:
(1) Explain how to forecast volatility using a simple Heterogeneous Auto-Regressive (HAR) model. (Corsi, 2002)
(2) Check if higher moments like Skewness and Kurtosis add forecast value to this model.
Forecasting the Misery Index, follow-up
Five months ago I generated forecasts for the Eurozone Misery index. I used the built-in “FitAR” package in R. Using different models differing in their memory length (how many lags were considered for each model) 24 months ahead forecasts were generated. Might be interesting to see how accurate are the forecasts. The previous post is updated and few bugs corrected in the code. The updated data is public and can be found here. It is the sum of inflation rate and unemployment rate in the Euro-zone area.
Information Criteria for Autoregression
Some knowledge about the bootstrapping procedure is assumed.
In time series analysis, Information Criteria can be found under every green tree. These are function to help you determine when to stop adding explanatory variables to your model.
Bootstrapping time series – R code
Bootstrapping in its general form (“ordinary” bootstrap) relies on IID observations which staples the theory backing it. However, time series are a different animal and bootstrapping time series requires somewhat different procedure to preserve dependency structure.
Spurious Regression Illustrated
Spurious Regression problem dates back to Yule (1926): “Why Do We Sometimes Get Nonsense Correlations between Time-series?”. Lets see what is the problem, and how can we fix it. I am using Morgan Stanley (MS) symbol for illustration, pre-crisis time span. Take a look at the following figure, generated from the regression of MS on the S&P, actual prices of the stock, actual prices of the S&P, when we use actual prices we term it regression in levels, as in price levels, as oppose to log transformed or returns.
Piecewise Regression
A beta of a stock generally means its relation with the market, how many percent move we should expect from the stock when the market moves one percent.
Market, being a somewhat vague notion is approximated here, as usual, using the S&P 500. This aforementioned relation (henceforth, beta) is detrimental to many aspects of trading and risk management. It is already well established that volatility has different dynamics for rising markets and for declining market. Recently, I read few papers that suggest the same holds true for beta, specifically that the beta is not the same for rising markets and for declining markets. We anyway use regression for estimation of beta, so piecewise linear regression can fit right in for an investor/speculator who wishes to accommodate himself with this asymmetry.
