Bootstrapping time series – R code

Bootstrapping in its general form (“ordinary” bootstrap) relies on IID observations which staples the theory backing it. However, time series are a different animal and bootstrapping time series requires somewhat different procedure to preserve dependency structure.

More

Better summary function in R

The summary function in R returns:

For the univariate case I wrote what I consider to be a better summary function which returns:

More

Forecasting the Eurozone Misery index

Is Miss Stagflation coming to visit?
The Misery index is the sum of inflation and unemployment rate. We would like them both to stay naturally low, and we are miserable when they are not. The index is currently floating in it’s record scratching levels. In this post I demonstrate the use of the nice FitAR package in R to fit an AR model and see what we can expect accordingly. Inflation and unemployment numbers concerning the Eurozone (17 countries) can be found here.
Have a look at the index over time:

More

Stock market Kurtosis over time

In the last decade we have observed an increase in computational power, information availability, speed of execution and stock market competition in general. One might think that, as a result, we are prone to larger shocks that occur faster than what was common in the past. I crunched some numbers and was surprised to see that this is not the case.

More

Kurtosis Interpretation

When you google “Kurtosis”, you encounter many formulas to help you calculate it, talk about how this measure is used to evaluate the “peakedness” of your data, maybe some other measures to help you do so, maybe all of a sudden a side step towards Skewness, and how both Skewness and Kurtosis are higher moments of the distribution. This is all very true, but maybe you just want to understand what does Kurtosis mean and how to interpret this measure. Similarly to the way you interpret standard deviation (the average distance from the average). Here I take a shot at giving a more intuitive interpretation.

More

Marriage is good for your income

For those of you who are into machine learning, here you can find a cool collection of databases to play around with your favorite algorithm. I choose one out of the available 200 and fit a logistic regression model. The idea is to see what kind of properties are common for those who earn above 50K a year. Our data is such that the “y” variable is binary. A value of 1 is given if the individual earns above 50K and 0 if below. We know many things about the individual. Level of education in years, age, is she married, where from, which sector is she working in, how many working hours per week, race, and more. We can fit logistic regression, which is quite standard for a binary dependent variable, and see which variables are important.

More

Most profitable hedge fund style

This is not an investment advice!!

Couple of weeks back, during amst-R-dam user group talk on backtesting trading strategies using R, I mentioned the most effective style for hedge funds is relative value statistical arbitrage, I read it somewhere. After the talk was over, I was not sure anymore if it was correct to say it and decided to check it.

More

Bootstrap example

Bootstrap your way into robust inference. Wow, that was fun to write..

Introduction
Say you made a simple regression, now you have your  \widehat{\beta} . You wish to know if it is significantly different from (say) zero. In general, people look at the statistic or p.value reported by their software of choice, (heRe). Thing is, this p.value calculation relies on the distribution of your dependent variable. Your software assumes normal distribution if not told differently, how so? for example, the (95%) confidence interval is  \widehat{\beta} \pm 1.96 \times sd( \widehat{\beta}) , the 1.96 comes from the normal distribution.
It is advisable not to do that, the beauty in bootstrapping* is that it is distribution untroubled, it’s valid for dependent which is Gaussian, Cauchy, or whatever. You can defend yourself against misspecification, and\or use the tool for inference when the underlying distribution is unknown.

More

Europe most dangerous cities

When I was searching for data about U.S prison population, for another post, I ran across eurostat, a nice source for data to play around with. I pooled some numbers, specifically homicides recorded by the police. A panel data for 36 cities over time, from 2000 to 2009. Lets see which are the cities that have problems in this area.

More

U.S. prison population

I recently finished reading: The author writes that 6.6% of U.S. American residents will find themselves at some point in their life incarcerated, about 20 million people. A big number on anyone’s scale. You can also find disturbing figures in Wikipedia: figure. Are these facts misleading? we need to account for population growth. The prison population should naturally rise even if the proportion of crime in the general population is constant, since the population itself is growing. Here I show that these facts are NOT misleading, and that the system is indeed not fulfilling its purpose.

More

Spurious Regression illustrated

Spurious Regression problem dates back to Yule (1926): “Why Do We Sometimes Get Nonsense Correlations between Time-series?”. Lets see what is the problem, and how can we fix it. I am using Morgan Stanley (MS) symbol for illustration, pre-crisis time span.  Take a look at the following figure, generated from the regression of MS on the S&P, actual prices of the stock, actual prices of the S&P, when we use actual prices we term it regression in levels, as in price levels, as oppose to log transformed or returns.

More