Measurement error bias

What is measurement error bias?

Errors-in-variables, or measurement error situation happens when your right hand side variable(s); your $x$ in a $y_t = \alpha + \beta x_t + \varepsilon_t$ model is measured with error. If $x$ represents the price of a liquid stock, then it is accurately measured because the trading is so frequent. But if $x$ is a volatility, well, it is not accurately measured. We simply don’t yet have the power to tame this variable variable.

Unlike the price itself, volatility estimates change with our choice of measurement method. Since no model is a perfect depiction of reality, we have a measurement error problem on our hands.

Ignoring measurement errors leads to biased estimates and, good God, inconsistent estimates.

What seems to be the problem officer?

Formally, if we think we are using $x_t$ but really we use are using $x_t+noise_t, \;$ then the usual regression model becomes:

$y_t= \beta_0 + \beta_1(x_t+noise_t) + \varepsilon_t = \beta_0 + \beta_1 x_t + \underbrace{\beta_1 noise_t + \varepsilon_t}_{\mbox{super residual}} .$

What we can see from the estimation of $\beta$ in such a model, using the usual regression is
$\widehat{\beta_1} = \frac{cov(x,y)}{var(x)}$ which is the usual estimate for $\beta$ , but since we measure x with noise

$\widehat{\beta_1} = \frac{ cov(x + noise, \beta_0 + \beta_1 x + \varepsilon_{super} ) } {var(x+noise)}.$

Using the usual covariance rules, and assuming the noise in x does has nothing to do with x itself we have

$\widehat{\beta_1} = \beta_1 \frac{var(x) + cov(noise , \varepsilon_{super} ) } {var(x) + var(noise) }.$

Now if you assume the noise in x has nothing to do with the noise in y, meaning $cov(noise , \varepsilon_{super}), = 0,$ you are left with a faulty estimate. Faulty in the sense that it is misleading; you are after $\beta_1$ but you get

$\beta_1 \frac{var(x)} {var(x) + var(noise) }.$

How bad is your estimate depends on how accurately x is measured, the noisier, the stronger the bias. Actually, a more accurate statement is that the noisier it is, compared with the original variance of x, the larger the bias.

We don’t see the number of observations anywhere in this derivation so that’s bad news, since it means this bias is not going anywhere, not by adding more data at least.

The quantity $\frac{var(x)} {var(x) + var(noise)}$ is always smaller than 1. So our estimate is always weaker (smaller absolute value) than what it would otherwise be. This is called attenuation, or dilution. Why is this important? It matters for forecasting and for significance testing. The $\widehat{\beta}$ may not be prominent enough to pass the test (because it is closer to zero than it should be), and we can continue working while wrongly dismissing that variable has having no real impact.

The dashed line is the simulated parameter. The grey curve is the simulated distribution with no measurement errors. Notice how it is centered correctly. The green curve is simulated from an estimate with a ratio $\frac{var(x)} {var(x) + var(noise) }$ of $\frac{0.3}{1.3}$ and the blue curve is simulated from a ratio of $\frac{1}{2}$ .

You can see that then the ratio is $\; \frac{var(x)} {var(x) + var(noise) }= \frac{1}{2}$ the blue curve much more strongly attenuated. Something to keep in mind when making inference. Code is below.

#--------------------------------------------
# Simulation code for measurement error bias (attenuation)
#--------------------------------------------
RR <- 500
beta <- beta_biased_small <- beta_biased_large <- NULL
for (i in 1:RR){
  x <- rnorm(10)
  eps <- rnorm(10)
  xnoise_small <- x + rnorm(10, 0, 0.3)
  xnoise_large <- x + rnorm(10, 0, 1)
  y <- 2 + -2 *x + eps
  beta[i] <- lm(y~x)$coef[2]
  beta_biased_small[i] <- lm(y~xnoise_small)$coef[2]
  beta_biased_large[i] <- lm(y~xnoise_large)$coef[2]
}
den0 <- density(beta)
den_biased_small <- density(beta_biased_small)
den_biased_large <- density(beta_biased_large)
lines(den0$y~ den0$x, ty= "l", lwd=2, ylab="")
lines(den_biased_small$y~ den_biased_small$x, col= 3, lwd= 2)
lines(den_biased_large$y~ den_biased_large$x, col= 4, lwd= 2)
abline(v= 2, col= 6, lwd= 4, lty= "dashed")
abline(v= -2, col= 6, lwd= 4, lty= "dashed")