In the past, I wrote about robust regression. This is an important tool which handles outliers in the data. Roger Koenker is a substantial contributor in this area. His website is full of useful information and code so visit when you have time for it. The paper which drew my attention is “Quantile Autoregression” found under his research tab, it is a significant extension to the time series domain. Here you will find short demonstration for stuff you can do with quantile autoregression in R.
Albert Schweitzer said: “Example is not the main thing in influencing others. It is the only thing.”, so I start with it.
Open CPU is a great project. Few months back, I wrote a function for plotting a moving window of the market average correlation. Jeroen C.L. Ooms was nice enough to upload it to their server. Something is now changed. Quotes now return as a character class, as oppose to numeric. This messes up the function and the plot does not renders. I don’t wish to disturb Jeroen C.L. Ooms again with the correction for the code (despite his kind replies in the past). This problem creates the opportunity to look at the glistening “Shiny” package. I used it to (quickly..) build an app for the plot. You can now view a live correlation plot with the moving window of your choice. Live, as the app requests current market data. The width of the window for correlation calculation is given as an input parameter.
The post has two goals:
(1) Explain how to forecast volatility using a simple Heterogeneous Auto-Regressive (HAR) model. (Corsi, 2002)
(2) Check if higher moments like Skewness and Kurtosis add forecast value to this model.
Volatility is unobserved. Hence we need to use observed quantity as a proxy. Every once in a while I still see people using squared daily return as a proxy. However, there is ample evidence that it is a bad one. Bad in a sense that it is noisy, which means that although on average it is a good estimate, on any individual day the estimate can be very far from the actual unobserved volatility. Here is a figure of the alleged standard deviation in the form of (square root of the) squared daily return for the recent year:
You can see that in many days, this noisy estimate suggests that the volatility was around 2% and more. To me, it does not make too much sense. The series is the S&P 500, so a move of 3% is a BIG one. You can also see how “jumpy” the series is. The figure illustrates why we should avoid using this estimate.
Five months ago I generated forecasts for the Eurozone Misery index. I used the built-in “FitAR” package in R. Using different models differing in their memory length (how many lags were considered for each model) 24 months ahead forecasts were generated. Might be interesting to see how accurate are the forecasts. The previous post is updated and few bugs corrected in the code. The updated data is public and can be found here. It is the sum of inflation rate and unemployment rate in the Euro-zone area.
In portfolio management, risk management and derivative pricing, volatility plays an important role. So important in fact that you can find more volatility models than you can handle (Wikipedia link). What follows is to check how well each model performs, in and out of sample. Here are three simple things you can do:
In the post pairs trading issues one of the problems raised was the unstable estimates of the stock’s beta with respect to the market. Here is a suggestion for a possible solution, which is not really a solution but more stuff to do to make you feel less stupid when trading based on your fragile estimates.
Some knowledge about the bootstrapping procedure is assumed.
In time series analysis, Information Criteria can be found under every green tree. These are function to help you determine when to stop adding explanatory variables to your model.
Is Miss Stagflation coming to visit?
The Misery index is the sum of inflation and unemployment rate. We would like them both to stay naturally low, and we are miserable when they are not. The index is currently floating in it’s record scratching levels. In this post I demonstrate the use of the nice FitAR package in R to fit an AR model and see what we can expect accordingly. Inflation and unemployment numbers concerning the Eurozone (17 countries) can be found here.
Have a look at the index over time:
When you google “Kurtosis”, you encounter many formulas to help you calculate it, talk about how this measure is used to evaluate the “peakedness” of your data, maybe some other measures to help you do so, maybe all of a sudden a side step towards Skewness, and how both Skewness and Kurtosis are higher moments of the distribution. This is all very true, but maybe you just want to understand what does Kurtosis mean and how to interpret this measure. Similarly to the way you interpret standard deviation (the average distance from the average). Here I take a shot at giving a more intuitive interpretation.
For those of you who are into machine learning, here you can find a cool collection of databases to play around with your favorite algorithm. I choose one out of the available 200 and fit a logistic regression model. The idea is to see what kind of properties are common for those who earn above 50K a year. Our data is such that the “y” variable is binary. A value of 1 is given if the individual earns above 50K and 0 if below. We know many things about the individual. Level of education in years, age, is she married, where from, which sector is she working in, how many working hours per week, race, and more. We can fit logistic regression, which is quite standard for a binary dependent variable, and see which variables are important.
Few weeks back I gave a talk about Backtesting trading strategies with R, got a few requests for the slides so here they are:
This is not an investment advice!!
Couple of weeks back, during amst-R-dam user group talk on backtesting trading strategies using R, I mentioned the most effective style for hedge funds is relative value statistical arbitrage, I read it somewhere. After the talk was over, I was not sure anymore if it was correct to say it and decided to check it.
Bootstrap your way into robust inference. Wow, that was fun to write..
Say you made a simple regression, now you have your . You wish to know if it is significantly different from (say) zero. In general, people look at the statistic or p.value reported by their software of choice, (heRe). Thing is, this p.value calculation relies on the distribution of your dependent variable. Your software assumes normal distribution if not told differently, how so? for example, the (95%) confidence interval is , the 1.96 comes from the normal distribution.
It is advisable not to do that, the beauty in bootstrapping* is that it is distribution untroubled, it’s valid for dependent which is Gaussian, Cauchy, or whatever. You can defend yourself against misspecification, and\or use the tool for inference when the underlying distribution is unknown.