Very.

If you have 10 possible independent regressors, and *none of which matter*, you have a good chance to find at least one is important.

Very.

If you have 10 possible independent regressors, and *none of which matter*, you have a good chance to find at least one is important.

A vector autoregression (VAR) process can be represented in a couple of ways. The usual form is as follows:

In the past, I wrote about robust regression. This is an important tool which handles outliers in the data. Roger Koenker is a substantial contributor in this area. His website is full of useful information and code so visit when you have time for it. The paper which drew my attention is “Quantile Autoregression” found under his research tab, it is a significant extension to the time series domain. Here you will find short demonstration for stuff you can do with quantile autoregression in R.

Albert Schweitzer said: “Example is not the main thing in influencing others. It is the only thing.”, so I start with it.

Assume you have a variable *y*, which has an expectation and a variance. The expectation is often modeled using linear regression so that E(y) equals, on average, $\beta_0 +\beta_1x$. The origin of the variability in *y* is the residual. Now, standard econometric courses start with the simple notion of “constant variance”, which means that the variance of the disturbances is steady and is not related to any of the explanatory variables that were chosen to model the expectation, this is called homoskedasticity assumption. In fact, in real life it is rarely the case. Courses should start with the heteroskedasticity assumption as this is the prevalent state of the world. In almost any situation you will encounter, the variance of the dependent variable is not constant, it matters what is the *x* for which we want to determine the variance of *y*.

The post has two goals:

**(1)** Explain how to forecast volatility using a simple Heterogeneous Auto-Regressive (HAR) model. (Corsi, 2002)

**(2)** Check if higher moments like Skewness and Kurtosis add forecast value to this model.

In the last few decades there has been tremendous progress in the realm of volatility estimation. A major step is the additional use of intraday price path. It has been shown that estimates which consider intraday information are more accurate. Which is to say they converge faster to the real unobserved value of the true volatility.

Some knowledge about the bootstrapping procedure is assumed.

In time series analysis, Information Criteria can be found under every green tree. These are function to help you determine when to stop adding explanatory variables to your model.

Bootstrapping in its general form (“ordinary” bootstrap) relies on IID observations which staples the theory backing it. However, time series are a different animal and bootstrapping time series requires somewhat different procedure to preserve dependency structure.

Is Miss Stagflation coming to visit?

The Misery index is the sum of inflation and unemployment rate. We would like them both to stay naturally low, and we are miserable when they are not. The index is currently floating in it’s record scratching levels. In this post I demonstrate the use of the nice *FitAR* package in R to fit an AR model and see what we can expect accordingly. Inflation and unemployment numbers concerning the Eurozone (17 countries) can be found here.

Have a look at the index over time:

In the last decade we have observed an increase in computational power, information availability, speed of execution and stock market competition in general. One might think that, as a result, we are prone to larger shocks that occur faster than what was common in the past. I crunched some numbers and was surprised to see that this is not the case.

When you google “Kurtosis”, you encounter many formulas to help you calculate it, talk about how this measure is used to evaluate the “peakedness” of your data, maybe some other measures to help you do so, maybe all of a sudden a side step towards Skewness, and how both Skewness and Kurtosis are *higher moments* of the distribution. This is all very true, but maybe you just want to understand what does Kurtosis mean and how to interpret this measure. Similarly to the way you interpret standard deviation (the average distance from the average). Here I take a shot at giving a more intuitive interpretation.

For those of you who are into machine learning, here you can find a cool collection of databases to play around with your favorite algorithm. I choose one out of the available 200 and fit a logistic regression model. The idea is to see what kind of properties are common for those who earn above 50K a year. Our data is such that the “y” variable is binary. A value of 1 is given if the individual earns above 50K and 0 if below. We know many things about the individual. Level of education in years, age, is she married, where from, which sector is she working in, how many working hours per week, race, and more. We can fit logistic regression, which is quite standard for a binary dependent variable, and see which variables are important.

Bootstrap your way into robust inference. Wow, that was fun to write..

**Introduction**

Say you made a simple regression, now you have your . You wish to know if it is significantly different from (say) zero. In general, people look at the statistic or p.value reported by their software of choice, (heRe). Thing is, this p.value calculation relies on the distribution of your dependent variable. Your software assumes normal distribution if not told differently, how so? for example, the (95%) confidence interval is , the 1.96 comes from the normal distribution.

It is advisable not to do that, the beauty in bootstrapping* is that it is distribution untroubled, it’s valid for dependent which is Gaussian, Cauchy, or whatever. You can *defend* yourself against misspecification, and\or use the tool for inference when the underlying distribution is unknown.

Spurious Regression problem dates back to Yule (1926): “Why Do We Sometimes Get Nonsense Correlations between Time-series?”. Lets see what is the problem, and how can we fix it. I am using Morgan Stanley (MS) symbol for illustration, pre-crisis time span. Take a look at the following figure, generated from the regression of MS on the S&P, *actual prices* of the stock, *actual prices* of the S&P, when we use actual prices we term it regression in levels, as in price levels, as oppose to log transformed or returns.