Statistical Shrinkage

Imagine you’re picking from 1,000 money managers. If you test just one, there’s a 5% chance you might wrongly think they’re great. But test 10, and your error chance jumps to 40%. To keep your error rate at 5%, you need to control the “family-wise error rate.” One method is to set higher standards for judging a manager’s talent, using a tougher t-statistic cut-off. Instead of the usual 5% cut (t-stat=1.65), you’d use a 0.5% cut (t-stat=2.58).

When testing 1,000 managers or strategies, the challenge increases. You’d need a manager with an extremely high t-stat of about 4 to stay within the 5% error rate. This big jump in the t-stat threshold helps keep the error rate in check. However that is discouragingly strict: a strategy which t-stat of 4 is rarity.

A new paper from Bradley Efron: Machine learning and the James–Stein estimator makes a nice connection between the James and Stein’s shrinkage estimation and False Discovery Rate (FDR). I explained what is FDR here and so will not repeat myself.

FDR can be viewed as shrinkage of the significance levels. Rather than deciding that strategy works only if it has t-stat of 4, say, you decide that it works even if it has a lower t-stat; which means you don’t require such high significance level for a strategy to “make the cut”. So effectively you shrink the significance level.

The brilliance of linking the ideas from Stein’s 1961 paper and Benjamini and Hochberg’s 1995 work is why I never miss a paper from Bradley Efron. It’s thought-provoking. Both papers have what seemingly looks like a paradoxical approach. Both papers assume independence yet despite that assumed independence “what happens with the others” plays a crucial role in our decision-making.