There are many problems with p-values, and I too have chipped in at times. I recently sat in a presentation of an excellent paper, to be submitted to the highest ranked journal in the field. The authors did not conceal their ruthless search for those mesmerizing asterisks indicating significance. I was curious to see many in the crowd are not aware of current history in the making regarding those asterisks.
The web is now swarming with thought-provoking discussions about the recent American Statistical Association (ASA) statement on p-values. Despite their sincere efforts, there are still a lot of back-and-forth over what they actually mean. Here is how I read it.
I rephrase directly from the statement itself. All emphasis mine:
1. P-values can indicate how incompatible the data are with a specified statistical model. This incompatibility can be interpreted as casting doubt on, or providing evidence against the null, or against the underlying assumptions.
So in actuality, the way you set up the test carries a lot of weight when interpreting the resulting p-value.
2. P-values do not measure the probability that the studied hypothesis is true, or the probability that the data were produced by random chance alone.
P-values are never absolute, but only with relation to what you are doing. Hypothesis which pass a test is true only with respect to how you tested it.
3. Scientific conclusions and business or policy decisions should not be based only on whether a p-value passes a specific threshold. A conclusion does not immediately become “true” on one side of the divide and “false” on the other. Researchers should bring many contextual factors into play to derive scientific inferences. Pragmatic considerations often require binary, “yes-no” decisions, but this does not mean that p-values alone can ensure that a decision is correct or incorrect.
Many years ago, at one of my first undergraduate statistics course, I remember being perplexed with this point myself. I clearly remember asking what do we do with those 0.052 p-values, it seemed absurd. Sometimes so close to 0.05 that omitting or adding a single observation would alter the decision if to reject or not. I got the frowned reply that here we study EXACT science; take that Prof. D.
4. Proper inference requires full reporting and transparency.
Sometimes researchers search across many different hypotheses and\or specifications, and end up reporting the one which shows significance. The rationale for that is that you would have a better chance of making it to publication if you show at least some significant findings. However, when this happens we need to adjust the reported p-values, they lose their original meaning. More on this here: Don’t believe anything you read.
Though I agree wholeheartedly, I don’t see things changing on that front any time soon. For all to be transparent, We all need to step out of the darkness together. The “invisible hand” actually guide us the other way. I find it hard to see the first mass of researchers to submit a paper with the bottom line “I tried, but have only ‘I tried’ to show for it”. The current equilibrium where nearly no one reveals everything is absorbing, in my opinion.
5. A p-value, or statistical significance, does not measure the size of an effect or the importance of a result.
This is very much related to nowadays, where sometimes you have one million observations. Is it significant? really? (read sarcastically). We now must also have the so-called human significance, or economic significance. Basically “does it matter?” question.
6. By itself, a p-value does not provide a good measure of evidence regarding a model or hypothesis.
A good wrap. Not “don’t use it”, but most decidedly “don’t solely rely on it”.
Though as I mentioned at the outset, there are still on-going discussions as to how strongly ASA feels regarding p-values, one thing is clear: this is a long overdue step forward.
I strongly believe blogging has an actual impact, with human significance that is..
It is delightful to be able to read what very smart and knowledgeable people are saying, relieving us all from waiting few weeks until a “Comment on the ASA statement” is published, we can simply visit their site.
I see high-end companies do it. I see top researchers do it. I see central bankers do it. It was the first time for me to see a blog-post cited in such a prestigious setting, and it was very pleasing. From the statement: “A week later, statistician and “Simply Statistics” blogger Jeff Leek responded…”. And the actual reference:
Leek, J. (2014), “On the scalability of statistical procedures: why the p-value bashers just don’t get it,” Simply Statistics blog, available here.