Omitted Variable Bias

Frequently, we see the term ‘control variables’. The researcher introduces dozens of explanatory variables she has no interest in. This is done in order to avoid the so-called ‘Omitted Variable Bias’.

What is Omitted Variable Bias?

In general, OLS estimator has great properties, not the least important is the fact that for a finite number of observations you can faithfully retrieve the marginal effect of X on Y, that is E(\widehat{\beta}) = \beta. This is very much not the case when you have a variable that should be included in the model but is left out. As in my previous posts about Multicollinearity and heteroskedasticity, I only try to provide the intuition since you are probably familiar with the result itself.

For illustration, consider the model

    \begin{align*} y_t = x_{1,t}+x_{2,t} + \varepsilon_t,  \end{align*}

So Y is the sum, i.e. \beta_1 = \beta_2 = 1. What happens to our estimate for \beta_1 when we do not include x_2 in the model? Mathematically we get the simple result*:

    \begin{align*} E(\widehat{\beta}_1)   = \beta_1 + \beta_2 \times \widehat{\gamma}_1. \end{align*}

The second term on the RHS is bad news. \widehat{\gamma}_1 is the estimate of the coefficient from the (hypothetical) equation

    \begin{align*} x_{2,t} = \gamma_0 + \gamma_1 x_{1,t} + \nu_t.  \end{align*}

In words, the term \beta_2 \times \widehat{\gamma}_1 represents the bias. It is influenced by:
1.
The real unknown value of \beta_2. If the real effect of x_2 on Y is absolute small, it pushes the combined term to zero and bias is small.
2.
How closely related are x_2 to x_1. This is less trivial, if the x_2 has nothing to do with x_1 and you are lucky to get the estimate \widehat{\gamma}_1 to show it, the multiplicand goes to zero and bias is small. You need to be lucky since the estimate (and hence the bias) depends on the actual sample you have, you can be unlucky and get an absolute large estimate even when the X’s are independent in the population level. This subtlety can be better stated in classical textbooks.

Now, why is this so? The unaccounted-for influence of x_2 on Y, pushes through anyway. Mr. x_2 tells himself that if he is out, he is going to do what he can from the outside. He talks to x_1 and depending on (as in the dry formulas), how muscular is Mr. x_2 (real value of \beta_2) and the nature of their relationship (as in 2 above), x_1 is going to accommodate x_2 with his request. If they do not know each other, i.e. correlation is zero, x_1 will ignore this harassment and the bias is unlikely to be strong.

Illustration of Omitted Variable Bias

For a couple of more important insights, I need to make an illustration:

Omitted Variable Bias
The fact that you have more 500 instead of 50 observation does absolutely nothing to mitigate this problem. This is to say that the estimate is not only biased but also inconsistent, which means we can not get around this problem it easily.

Finally, have a look at this absurd situation, I plot the standard deviation of the estimate when there is a bias and when there isn’t:

Standard Deviation when OV
In black, the standard deviation when the model is correctly specified. The parabolic shape is due to multicollinearity. In red is the (estimated) standard deviation of the incorrectly specified model.

It is absurd. When the bias is small, around zero, we find it harder to estimate the parameter (standard deviation of the estimate is relatively high). On the other hand, when the bias is strong, standard deviation is lower. When the the model is more severely misspecified, we get a more accurate estimate. Talk about the dangerous inference.

* Derivation in page 149 of this book.

Leave a Reply

Your email address will not be published.