Bias vs. Consistency

Especially for undergraduate students but not just, the concepts of unbiasedness and consistency as well as the relation between these two are tough to get one’s head around. My aim here is to help with this. We start with a short explanation of the two concepts and follow with an illustration.

Unbiased estimator means that the distribution of the estimator is centered around the parameter of interest:

$E(estimator) = parameter$

for the usual least square estimator this means that $E(\widehat{\beta}) = \beta$ . Note this has nothing to do with the number of observation used in the estimation.

Explanation
Better to explain it with the contrast: What does a biased estimator mean?

An estimator depends on the observations you feed into it. For instance, $(X^ \prime X)^{-1}X^ \prime y$ depends on the sample (X,y). For different sample, you get different estimator $\widehat{\beta}$ . This means that the number you eventually get has a distribution. I think this is the biggest problem for graduate students. Most of them think about the average as a constant number, not as an estimate which has it’s own distribution. The average is sample dependent, and the mean is the real unknown parameter and is constant (Bayesians, keep your cool please), this distinction is never sharp enough.

A biased estimator means that the estimate we see comes from a distribution which is not centered around the real parameter. Why such estimators even exist? Why shouldn’t we correct the distribution such that the center of the distribution of the estimate exactly aligned with the real parameter? The answer is that the location of the distribution is important, that the middle of the distribution falls in line with the real parameter is important, but this is not all we care about. Our estimate comes from the single realization we observe, we also want that it will not be VERY far from the real parameter, so this has to do not with the location but with the shape.

Sometimes we are willing to trade the location for a ‘better’ shape, a tighter shape around the real unknown parameter. Imagine an estimator which is not centered around the real parameter (biased) so is more likely to ‘miss’ the real parameter by a bit, but is far less likely to ‘miss’ it by large margin, versus an estimator which is centered around the real parameter (unbiased) but is much more likely to ‘miss’ it by large margin and deliver an estimate far from the real parameter. You see, we do not know what is the impact of interest rate move on level of investment, we will never know it. We only have an estimate and we hope it is not far from the real unknown sensitivity. That is why we are willing to have this so-called bias-variance tradeoff, so that we reduce the chance to be unlucky in that the realization combined with the unbiased estimator delivers an estimate which is very far from the real parameter.

What does Consistency mean?

Maybe the estimator is biased, but if we increase the number of observation to infinity, we get the correct real number. Note that this concept has to do with the number of observations. For example the OLS estimator is such that (under some assumptions):

$\lim_{n \rightarrow \infty} pr(\vert \widehat{\beta} - \beta \vert) < \varepsilon = 0,$

meaning that it is consistent, since when we increase the number of observation the estimate we will get is very close to the parameter (or the chance that the difference between the estimate and the parameter is large (larger than epsilon) is zero). A mind boggling venture is to find an estimator that is unbiased, but when we increase the sample is not consistent (which would essentially mean that more data harms this absurd estimator).

Now, we have a 2 by 2 matrix,
1: Unbiased and consistent
2: Biased but consistent
3: Biased and also not consistent
4: Unbiased but not consistent

(1) In general, if the estimator is unbiased, it is most likely to be consistent and I had to look for a specific hypothetical example for when this is not the case (but found one so this can’t be generalized).
(2) Not a big problem, find or pay for more data
(3) Big problem – encountered often
(4) Could barely find an example for it

Illustration
Hopefully the following charts will help clarify the above explanation. The real parameter is set to 2 in all simulations, and is always represented by the green vertical line. The red vertical line is the average of a simulated 1000 replications.
1: Unbiased and Consistent


### Unbiased and Consistent
repet <- 1000
TT <- 20
beta <- NULL
for (i in 1:repet){
  x <- rnorm(TT)
  eps <- rnorm(TT,0,1)
  y=2+2*x+eps
  beta[i] <- lm(y~x)$coef[2]
}
histfun(beta, mainn= "Unbiased")
TT <- 1000
beta <- NULL
for (i in 1:repet){
  x <- rnorm(TT)
  eps <- rnorm(TT,0,1)
  y=2+2*x+eps
  beta[i] <- lm(y~x)$coef[2]
}
histfun(beta,mainn="Consistent")

### Unbiased and Consistent

repet <- 1000

TT <- 20

beta <- NULL

for (i in 1:repet){

x <- rnorm(TT)

eps <- rnorm(TT,0,1)

y=2+2*x+eps

beta[i] <- lm(y~x)$coef[2]

}

histfun(beta, mainn= "Unbiased")

TT <- 1000

beta <- NULL

for (i in 1:repet){

x <- rnorm(TT)

eps <- rnorm(TT,0,1)

y=2+2*x+eps

beta[i] <- lm(y~x)$coef[2]

}

histfun(beta,mainn="Consistent")

2: Biased but consistent


###  Biased But Consistent
TT <- 20
beta <- NULL
for (i in 1:repet){
  x <- rnorm(TT)
  eps <- rnorm(TT,0,1)
  y=2+2*x+eps
  beta[i] <- (10/TT)+lm(y~x)$coef[2] # Text book example
}
histfun(beta,mainn="Biased")
TT <- 1000
for (i in 1:repet){
  x <- rnorm(TT)
  eps <- rnorm(TT,0,1)
  y=2+2*x+eps
  beta[i] <- (10/TT)+lm(y~x)$coef[2] 
}
histfun(beta,mainn="Consistent")

### Biased But Consistent

TT <- 20

beta <- NULL

for (i in 1:repet){

x <- rnorm(TT)

eps <- rnorm(TT,0,1)

y=2+2*x+eps

beta[i] <- (10/TT)+lm(y~x)$coef[2] # Text book example

}

histfun(beta,mainn="Biased")

TT <- 1000

for (i in 1:repet){

x <- rnorm(TT)

eps <- rnorm(TT,0,1)

y=2+2*x+eps

beta[i] <- (10/TT)+lm(y~x)$coef[2]

}

histfun(beta,mainn="Consistent")

3: Biased and also not consistent, omitted variable bias


### Omitted Variable Bias: Biased and Inconsistent
TT <- 20
beta <- NULL
for (i in 1:repet){
  x <- rnorm(TT)
  x2 <- rnorm(TT)+.2*x # make it correlate a bit with x
  eps <- rnorm(TT,0,1)
  y=2+2*x+2*x2+eps
  beta[i] <- lm(y~x)$coef[2] # I do not include x2
}
histfun(beta,mainn="Biased")
TT <- 1000
beta <- NULL
for (i in 1:repet){
  x <- rnorm(TT)
  x2 <- rnorm(TT)+.2*x 
  eps <- rnorm(TT,0,1)
  y=2+2*x+2*x2+eps
  beta[i] <- lm(y~x)$coef[2] 
}
histfun(beta,mainn="Inconsistent")

### Omitted Variable Bias: Biased and Inconsistent

TT <- 20

beta <- NULL

for (i in 1:repet){

x <- rnorm(TT)

x2 <- rnorm(TT)+.2*x # make it correlate a bit with x

eps <- rnorm(TT,0,1)

y=2+2*x+2*x2+eps

beta[i] <- lm(y~x)$coef[2] # I do not include x2

}

histfun(beta,mainn="Biased")

TT <- 1000

beta <- NULL

for (i in 1:repet){

x <- rnorm(TT)

x2 <- rnorm(TT)+.2*x

eps <- rnorm(TT,0,1)

y=2+2*x+2*x2+eps

beta[i] <- lm(y~x)$coef[2]

}

histfun(beta,mainn="Inconsistent")

You see here why omitted variable bias for example, is such an important issue in Econometrics. The fact that you get the wrong estimate even if you increase the number of observation is very disturbing. On the obvious side since you get the wrong estimate and, which is even more troubling, you are more confident about your wrong estimate (low std around estimate).

4: Unbiased but not consistent – idiotic textbook example – other suggestions welcome


###  Unbiased But Inconsistent - Only example I am familiar with
# Other examples welcome
TT <- 20
beta <- NULL
for (i in 1:repet){
  x <- rnorm(TT,2,1)
  y = x
  beta[i] <- y[1]
}
histfun(beta,mainn="Unbiased")
TT <- 1000
  for (i in 1:repet){
    x <- rnorm(TT,2,1)
    y = x
    beta[i] <- y[1] # Ignore the effect of having Large TT 
  }
histfun(beta,mainn="Inconsistent")

### Unbiased But Inconsistent - Only example I am familiar with

# Other examples welcome

TT <- 20

beta <- NULL

for (i in 1:repet){

x <- rnorm(TT,2,1)

y = x

beta[i] <- y[1]

}

histfun(beta,mainn="Unbiased")

TT <- 1000

for (i in 1:repet){

x <- rnorm(TT,2,1)

y = x

beta[i] <- y[1] # Ignore the effect of having Large TT

}

histfun(beta,mainn="Inconsistent")

Appendix
Just a word regarding other possible confusion. Papers also use the term ‘consistent’ in regards to selection criteria. Selection criteria deliver estimate but for a structure, e.g. exact number of lags to be used in a time series. In those cases the parameter is the structure (for example the number of lags) and we say the estimator, or the selection criterion is consistent if it delivers the correct structure. For example the AIC does not deliver the correct structure asymptotically (but has other advantages) while the BIC delivers the correct structure so is consistent (if the correct structure is included in the set of possibilities to choose from of course). An even greater confusion can arise by reading that “LASSO is consistent”, since LASSO delivers both structure and estimates so be sure you understand what do the authors mean exactly.

Code needed in the preamble if you want to run the simulation.


histfun <- function(what,breaks=44,mainn,xlim=c(1,3)){
  cexx=1.4;lwdd=4
  hist(beta,breaks=44,freq=F,xlim=c(1,3),main=paste(mainn, ", #Obs = ", TT,sep=""),
       cex.axis=cexx,cex.main=cexx,cex.lab=cexx-.2,col='lightblue')
  den <- density(what)
  abline(v=2,col=3,lwd=lwdd)
  abline(v=mean(what),col=2,lwd=lwdd)
  lines(den,col=2,lwd=lwdd)
}

histfun <- function(what,breaks=44,mainn,xlim=c(1,3)){

cexx=1.4;lwdd=4

hist(beta,breaks=44,freq=F,xlim=c(1,3),main=paste(mainn, ", #Obs = ", TT,sep=""),

cex.axis=cexx,cex.main=cexx,cex.lab=cexx-.2,col='lightblue')

den <- density(what)

abline(v=2,col=3,lwd=lwdd)

abline(v=mean(what),col=2,lwd=lwdd)

lines(den,col=2,lwd=lwdd)

}

You might also like:

15 comments on “Bias vs. Consistency”

JERRY says:

07/06/2015 at 9:08 PM

Thanks, it wouldn’t have been clearer

Ludovic Van den Bergen says:

09/29/2015 at 4:13 PM

This is an amazing website!

I also found this example for (4), from Davidson 2004, page 96

yt=B1+B2*(1/t)+ut with idd ut has unbiased Bs but inconsistent B2.
also
mu=0.01*y1 + 0.99/(n-1) sum_{t=2}^n*yt

1. Eran Raviv says:
  
  10/08/2015 at 7:23 PM
  
  In the book I have it on page 98. Yeah, nice example. Thanks for this.
  
Tomilayo says:

10/13/2018 at 10:16 AM

Thanks so much, this is helpful

Lana says:

10/16/2018 at 11:01 AM

Thank you a lot, everything is clear. The unique thing I can’t get is what is “repet” you used in the loop for in the R code…

1. Eran Raviv says:
  
  11/11/2018 at 2:53 PM
  
  Repet for repetition: number of simulations.
  
Ulla says:

10/30/2018 at 11:14 AM

Thank you very much! The graphics really bring the point home

Hongzhen says:

10/16/2020 at 11:10 PM

Thanks for your works, this is quite helpful for me.

1. Eran Raviv says:
  
  10/22/2020 at 5:41 PM
  
  Thank you, that is the idea.
  
Bijesh Mishra says:

02/02/2021 at 6:27 PM

limit n -> infinity, pr|(b-b-hatt)| = 1 in figure is wrong. It should be 0.

1. Eran Raviv says:
  
  02/02/2021 at 8:34 PM
  
  Thank you – corrected.
  
Shreemann Mishra says:

03/30/2021 at 6:15 AM

histfun says not found? is there any library i should install first?

1. Eran Raviv says:
  
  04/17/2021 at 4:57 PM
  
  Search for “Code needed in the preamble if you want to run the simulation.” the function code is in the post
  
Andres says:

08/05/2023 at 3:44 PM

Thanks for the great explanation. May I recommend using colorblind-friendly colors? Red+green is confusing to many folks. Red+blue would be a safer bet. Thanks!

1. Eran Raviv says:
  
  08/06/2023 at 3:12 PM
  
  Thank you. I will!

Bias vs. Consistency

You might also like:

Optimism of the Training Error Rate

When to avoid traffic if possible

Rython tips and tricks – Snippets

Correlation and correlation structure (9) – Parallelizing Matrix Computation

15 comments on “Bias vs. Consistency”

Leave a Reply to Ludovic Van den Bergen