Statistics and Econometrics - 8/9

My favourite statistician

Blog, Miscellaneous, Statistics and EconometricsPosted on 02/17/2014

We are all standing on the shoulders of giants. Bradley Efron is one such giant. With the invention of the bootstrap in 1979 and later with his very influential 2004 paper about the Least Angle Regression (and the accompanied software written in R).

Don’t believe anything you read

Blog, Miscellaneous, Statistics and EconometricsPosted on 01/21/2014

I just finished reading An estimate of the science-wise false discovery rate and application to the top medical literature. The authors ask how many of what we read is scientific journals is actually incorrect, or false.

What is overfitting?

Blog, Statistics and EconometricsPosted on 12/23/2013

Overfitting is strongly related to variable selection. It is a common problem and a tough one, best explained by way of example.

Omitted Variable Bias

Blog, Statistics and EconometricsPosted on 09/30/2013

Frequently, we see the term ‘control variables’. The researcher introduces dozens of explanatory variables she has no interest in. This is done in order to avoid the so-called ‘Omitted Variable Bias’.

What is Omitted Variable Bias?

In general, OLS estimator has great properties, not the least important is the fact that for a finite number of observations you can faithfully retrieve the marginal effect of X on Y, that is $E(\widehat{\beta}) = \beta$ . This is very much not the case when you have a variable that should be included in the model but is left out. As in my previous posts about Multicollinearity and heteroskedasticity, I only try to provide the intuition since you are probably familiar with the result itself.

Bayesian vs. Frequentist in Practice (cont’d)

Blog, Statistics and EconometricsPosted on 09/23/2013

Few weeks back I simulated a model and made the point that in practice, the difference between Bayesian and Frequentist is not large. Here I apply the code to some real data; a model for Industrial Production (IP).

Bayesian vs. Frequentist in Practice

Blog, Statistics and EconometricsPosted on 08/28/2013

Rivers of ink have been spilled over the ‘Bayesian vs. Frequentist’ dispute. Most of us were trained as Frequentists. Probably because the computational power needed for Bayesian analysis was not around when the syllabus of your statistical/econometric courses was formed. In this age of tablets and fast internet connection, your training does not matter much, you can easily transform between the two approaches, engaging the right webpages/communities. I will not talk about the ideological differences between the two, or which approach is more appealing and why. Larry Wasserman already gave an excellent review.

Understanding Multicollinearity

Blog, R, Statistics and EconometricsPosted on 06/12/2013

Roughly speaking, Multicollinearity occurs when two or more regressors are highly correlated. As with heteroskedasticity, students often know what does it mean, how to detect it and are taught how to cope with it, but not why is it so. From Wikipedia: “In this situation (Multicollinearity) the coefficient estimates may change erratically in response to small changes in the model or the data.” The Wikipedia entry continues to discuss detection, implications and remedies. Here I try to provide the intuition.

How Important is Variable Selection?

Blog, R, Statistics and EconometricsPosted on 05/22/2013

Very.

If you have 10 possible independent regressors, and none of which matter, you have a good chance to find at least one is important.

Moving Average Representation of VAR

Blog, R, Statistics and EconometricsPosted on 03/10/2013

A vector autoregression (VAR) process can be represented in a couple of ways. The usual form is as follows:

Quantile Autoregression in R

Blog, Finance and Trading, R, Risk, Statistics and EconometricsPosted on 02/09/2013

In the past, I wrote about robust regression. This is an important tool which handles outliers in the data. Roger Koenker is a substantial contributor in this area. His website is full of useful information and code so visit when you have time for it. The paper which drew my attention is “Quantile Autoregression” found under his research tab, it is a significant extension to the time series domain. Here you will find short demonstration for stuff you can do with quantile autoregression in R.

On p-value

Blog, R, Statistics and EconometricsPosted on 02/02/2013

Albert Schweitzer said: “Example is not the main thing in influencing others. It is the only thing.”, so I start with it.

Heteroskedasticity tests

Blog, Statistics and EconometricsPosted on 12/27/2012

Assume you have a variable y, which has an expectation and a variance. The expectation is often modeled using linear regression so that E(y) equals, on average, $\beta_0 +\beta_1x$. The origin of the variability in y is the residual. Now, standard econometric courses start with the simple notion of “constant variance”, which means that the variance of the disturbances is steady and is not related to any of the explanatory variables that were chosen to model the expectation, this is called homoskedasticity assumption. In fact, in real life it is rarely the case. Courses should start with the heteroskedasticity assumption as this is the prevalent state of the world. In almost any situation you will encounter, the variance of the dependent variable is not constant, it matters what is the x for which we want to determine the variance of y.

A Simple Model for Realized Volatility

Blog, Finance and Trading, R, Statistics and EconometricsPosted on 12/09/2012

The post has two goals:

(1) Explain how to forecast volatility using a simple Heterogeneous Auto-Regressive (HAR) model. (Corsi, 2002)
(2) Check if higher moments like Skewness and Kurtosis add forecast value to this model.

Intraday volatility measures

Blog, Code, Finance and Trading, Risk, Statistics and EconometricsPosted on 09/08/2012

In the last few decades there has been tremendous progress in the realm of volatility estimation. A major step is the additional use of intraday price path. It has been shown that estimates which consider intraday information are more accurate. Which is to say they converge faster to the real unobserved value of the true volatility.

Information Criteria for Autoregression

Blog, Statistics and EconometricsPosted on 08/15/2012

Some knowledge about the bootstrapping procedure is assumed.
In time series analysis, Information Criteria can be found under every green tree. These are function to help you determine when to stop adding explanatory variables to your model.