In the past, I wrote about robust regression. This is an important tool which handles outliers in the data. Roger Koenker is a substantial contributor in this area. His website is full of useful information and code so visit when you have time for it. The paper which drew my attention is “Quantile Autoregression” found under his research tab, it is a significant extension to the time series domain. Here you will find short demonstration for stuff you can do with quantile autoregression in R.
Albert Schweitzer said: “Example is not the main thing in influencing others. It is the only thing.”, so I start with it.
Assume you have a variable y, which has an expectation and a variance. The expectation is often modeled using linear regression so that E(y) equals, on average, $\beta_0 +\beta_1x$. The origin of the variability in y is the residual. Now, standard econometric courses start with the simple notion of “constant variance”, which means that the variance of the disturbances is steady and is not related to any of the explanatory variables that were chosen to model the expectation, this is called homoskedasticity assumption. In fact, in real life it is rarely the case. Courses should start with the heteroskedasticity assumption as this is the prevalent state of the world. In almost any situation you will encounter, the variance of the dependent variable is not constant, it matters what is the x for which we want to determine the variance of y.
Open CPU is a great project. Few months back, I wrote a function for plotting a moving window of the market average correlation. Jeroen C.L. Ooms was nice enough to upload it to their server. Something is now changed. Quotes now return as a character class, as oppose to numeric. This messes up the function and the plot does not renders. I don’t wish to disturb Jeroen C.L. Ooms again with the correction for the code (despite his kind replies in the past). This problem creates the opportunity to look at the glistening “Shiny” package. I used it to (quickly..) build an app for the plot. You can now view a live correlation plot with the moving window of your choice. Live, as the app requests current market data. The width of the window for correlation calculation is given as an input parameter.
The post has two goals:
(1) Explain how to forecast volatility using a simple Heterogeneous Auto-Regressive (HAR) model. (Corsi, 2002)
(2) Check if higher moments like Skewness and Kurtosis add forecast value to this model.
Few days ago I dropped my iPhone and cracked it. Though the iPhone still works, I decided it will be good to have a backup for my contact on my desktop. Fancy backup can be achieved in the following two step procedure: first synching your contacts information with facebook, and second, sending yourself an excel file with full details of your mobile contacts, phone number, date of birth, home page, work address and other details extracted from their facebook page. The process takes only few minutes and is free.
Volatility is unobserved. Hence we need to use observed quantity as a proxy. Every once in a while I still see people using squared daily return as a proxy. However, there is ample evidence that it is a bad one. Bad in a sense that it is noisy, which means that although on average it is a good estimate, on any individual day the estimate can be very far from the actual unobserved volatility. Here is a figure of the alleged standard deviation in the form of (square root of the) squared daily return for the recent year:
You can see that in many days, this noisy estimate suggests that the volatility was around 2% and more. To me, it does not make too much sense. The series is the S&P 500, so a move of 3% is a BIG one. You can also see how “jumpy” the series is. The figure illustrates why we should avoid using this estimate.
Five months ago I generated forecasts for the Eurozone Misery index. I used the built-in “FitAR” package in R. Using different models differing in their memory length (how many lags were considered for each model) 24 months ahead forecasts were generated. Might be interesting to see how accurate are the forecasts. The previous post is updated and few bugs corrected in the code. The updated data is public and can be found here. It is the sum of inflation rate and unemployment rate in the Euro-zone area.
In portfolio management, risk management and derivative pricing, volatility plays an important role. So important in fact that you can find more volatility models than you can handle (Wikipedia link). What follows is to check how well each model performs, in and out of sample. Here are three simple things you can do:
In the last few decades there has been tremendous progress in the realm of volatility estimation. A major step is the additional use of intraday price path. It has been shown that estimates which consider intraday information are more accurate. Which is to say they converge faster to the real unobserved value of the true volatility.
Do doctors unnecessarily prolong Colonoscopy? the answer is: they surely might.
In the post pairs trading issues one of the problems raised was the unstable estimates of the stock’s beta with respect to the market. Here is a suggestion for a possible solution, which is not really a solution but more stuff to do to make you feel less stupid when trading based on your fragile estimates.
Some knowledge about the bootstrapping procedure is assumed.
In time series analysis, Information Criteria can be found under every green tree. These are function to help you determine when to stop adding explanatory variables to your model.
Bootstrapping in its general form (“ordinary” bootstrap) relies on IID observations which staples the theory backing it. However, time series are a different animal and bootstrapping time series requires somewhat different procedure to preserve dependency structure.
The summary function in R returns:
Min. 1st Qu. Median Mean 3rd Qu. Max.
9.14 10.70 11.10 11.30 12.10 13.60
For the univariate case I wrote what I consider to be a better summary function which returns:
usum(x) # For univariate Summary
min med mean max sd skew kurt
1 9.14 11.13 11.35 13.65 1.057 0.3028 -0.6389
No NA's in the series
13.65 13.55 13.08 13.13