We are now collecting a lot of data. This is a good thing in general. But data collection and data storage capabilities have evolved fast. Much faster than statistical methods to go along with those voluminous numbers. We are still using good ole fashioned Fisherian statistics. Back then, when you had not too many observations, statistical significance actually meant something.
It does not necessarily the case today. A fact which scientists are increasingly coming to terms with (see bullet number 5 in the recent ASA statement on p-values).
The following plot is a simple illustration. I simulate from some simple regression model, with some realistic information-to-noise ratio. You can see the magnitude of the coefficient (in black) which goes along with a significant t-stat. In green, also overlaid a value of 2 which is routinely used as the threshold (critical value). Both as a function of the number of observations used for estimation. Pretty much statistics 101.
You can see how quickly the effect of the coefficient drops while maintaining its ‘statistically significant’ status. I stopped at about 10K observation, which is not much.
Take this paper for example, which talks about the value added of good teachers covers data on 2.5 million students and 18 million tests for math and English. What should be the effect of the coefficients so as to be knighted as significant?
Sure, with the abundance of data it is easier to recognize which coefficient has no effect. But we should redress, or redefine what does “no effect” means. The top panel of the next figure shows a statically significant effect of x on y (using the same simulation). It looks as if the effect is real. That is because it is; it was simulated as such. But as humans, we need to look at figures like that in the bottom panel. There, we can (and should) maintain a y-scale such that it is relevant for us, rather than trying to mesmerize a referee.
Perhaps in the past there was no need to continue the discussion once the researcher has shown significance. Today, when tiny effects can be nonetheless honored “statistically significant”, we must.
In finance it is quite easy. Findings are immediately followed by the usual “Can we make money out of that?” question. In other fields, I am not sure what should be the question. But there should be such a question.
Say, on average, that personal income is 1% increased as a result of great tutoring, statistically significant of course. Is this now helpful for policy decision making? We should be aware that the statistical significance perhaps no longer fulfills the same role as it once did. Other terms, like human significance and economic significance are now called for. Those are more applicable in this era of big data.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 |
cc <- seq(10, 10^4, 100) nn <- length(cc) bet <- NULL RR <- 15 # we stabilize the t values for better readability tval <- matrix(nrow= nn, ncol= RR) for(i in 1:nn){ for(j in 1:RR){ x <- rnorm(n= cc[i], mean= 0, sd= 1) sig0 <- 4 eps <- rnorm(n= cc[i], mean= 0, sd= sig0) bet[i] <- (2*sig0)/sqrt(cc[i]) y <- 2 + bet[i]*x + eps tval[i, j] <- summary(lm(y~x))$coef[2,3] }} # we stabilize the t values for better readability # on average they are 2, but they are volatile for this example tval <- apply(tval, 1, mean) plot(bet) lines(tval) |
Loved this one!
Significance is a subtle and confusing matter. You have a case here with statistical significance, but economically/humanly it may not be. Sometimes economics papers have insignificant results but still talk about possible economic effect. And I had an experience of statistically insignificant medical treatment which changed our lifes (the doctor said there is no evidence that dietary changes can help my 4-month old son’s eczema and yet they did help tremendously). I guess the latter would simply be called exception to the rule.