What is measurement error bias?
Errors-in-variables, or measurement error situation happens when your right hand side variable(s); your in a model is measured with error. If represents the price of a liquid stock, then it is accurately measured because the trading is so frequent. But if is a volatility, well, it is not accurately measured. We simply don’t yet have the power to tame this variable variable.
Unlike the price itself, volatility estimates change with our choice of measurement method. Since no model is a perfect depiction of reality, we have a measurement error problem on our hands.
Ignoring measurement errors leads to biased estimates and, good God, inconsistent estimates.
What seems to be the problem officer?
Formally, if we think we are using but really we use are using then the usual regression model becomes:
What we can see from the estimation of in such a model, using the usual regression is
which is the usual estimate for , but since we measure x with noise
Using the usual covariance rules, and assuming the noise in x does has nothing to do with x itself we have
Now if you assume the noise in x has nothing to do with the noise in y, meaning you are left with a faulty estimate. Faulty in the sense that it is misleading; you are after but you get
How bad is your estimate depends on how accurately x is measured, the noisier, the stronger the bias. Actually, a more accurate statement is that the noisier it is, compared with the original variance of x, the larger the bias.
We don’t see the number of observations anywhere in this derivation so that’s bad news, since it means this bias is not going anywhere, not by adding more data at least.
The quantity is always smaller than 1. So our estimate is always weaker (smaller absolute value) than what it would otherwise be. This is called attenuation, or dilution. Why is this important? It matters for forecasting and for significance testing. The may not be prominent enough to pass the test (because it is closer to zero than it should be), and we can continue working while wrongly dismissing that variable has having no real impact.
The dashed line is the simulated parameter. The grey curve is the simulated distribution with no measurement errors. Notice how it is centered correctly. The green curve is simulated from an estimate with a ratio of and the blue curve is simulated from a ratio of .
You can see that then the ratio is the blue curve much more strongly attenuated. Something to keep in mind when making inference. Code is below.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 |
#-------------------------------------------- # Simulation code for measurement error bias (attenuation) #-------------------------------------------- RR <- 500 beta <- beta_biased_small <- beta_biased_large <- NULL for (i in 1:RR){ x <- rnorm(10) eps <- rnorm(10) xnoise_small <- x + rnorm(10, 0, 0.3) xnoise_large <- x + rnorm(10, 0, 1) y <- 2 + -2 *x + eps beta[i] <- lm(y~x)$coef[2] beta_biased_small[i] <- lm(y~xnoise_small)$coef[2] beta_biased_large[i] <- lm(y~xnoise_large)$coef[2] } den0 <- density(beta) den_biased_small <- density(beta_biased_small) den_biased_large <- density(beta_biased_large) lines(den0$y~ den0$x, ty= "l", lwd=2, ylab="") lines(den_biased_small$y~ den_biased_small$x, col= 3, lwd= 2) lines(den_biased_large$y~ den_biased_large$x, col= 4, lwd= 2) abline(v= 2, col= 6, lwd= 4, lty= "dashed") abline(v= -2, col= 6, lwd= 4, lty= "dashed") |
Isn’t the challenge figuring out estimates of “var(x)/(var(x)+var(noise))” in actual empirical data? (And not just assuming it)