## Statistical Shrinkage (3)

Imagine you’re picking from 1,000 money managers. If you test just one, there’s a 5% chance you might wrongly think they’re great. But test 10, and your error chance jumps to 40%. To keep your error rate at 5%, you need to control the “family-wise error rate.” One method is to set higher standards for judging a manager’s talent, using a tougher t-statistic cut-off. Instead of the usual 5% cut (t-stat=1.65), you’d use a 0.5% cut (t-stat=2.58).

When testing 1,000 managers or strategies, the challenge increases. You’d need a manager with an extremely high t-stat of about 4 to stay within the 5% error rate. This big jump in the t-stat threshold helps keep the error rate in check. However that is discouragingly strict: a strategy which t-stat of 4 is rarity.

## Statistical Shrinkage (2)

During 2017 I blogged about Statistical Shrinkage. At the end of that post I mentioned the important role signal-to-noise ratio (SNR) plays when it comes to the need for shrinkage. This post shares some recent related empirical results published in the Journal of Machine Learning Research from the paper Randomization as Regularization. While mainly for tree-based algorithms, the intuition undoubtedly extends to other numerical recipes also.

## Bayesian vs. Frequentist in Practice, part 3

This post is inspired by Leo Breiman’s opinion piece “No Bayesians in foxholes”. The saying “there are no atheists in foxholes” refers to the fact that if you are in the foxhole (being bombarded..), you pray! Leo’s paraphrase indicates that when complex, real problems are present, there are no Bayesian to be found.

## Why complex models are data-hungry?

If you regularly read this blog then you know I am not one to jump on the “AI Bandwagon”, being quickly weary of anyone flashing the “It’s Artificial Intelligence” joker card. Don’t get me wrong, I understand it is a sexy term I, but to me it always feels a bit like a sales pitch.

If the machine does anything (artificially) intelligent it means that the model at the back is complex, and complex models need massive (massive I say) amounts of data. This is because of the infamous Curse of dimensionality.

I know it. You know it. Complex models need a lot of data. You have read this fact, even wrote it at some point. But why is it the case? “So we get a good estimate of the parameter, and a good forecast thereafter”, you reply. I accept. But.. what is it about simple models that they could suffice themselves with much less data compared to complex models? Why do I always recommend to start simple? and why the literature around shrinkage and overfitting is as prolific as it is?

## LASSO, LASSO, LASSO

LASSO stands for Least Absolute Shrinkage and Selection Operator. It was first introduced 21 years ago by Robert Tibshirani (Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society. Series B). In 2004 the four statistical masters: Efron, Hastie, Johnstone and Tibshirani joined together to write the paper Least angle regression published in the Annals of statistics. It is that paper that sent the LASSO to the podium. The reason? they removed a computational barrier. Armed with a new ingenious geometric interpretation, they presented an algorithm for solving the LASSO problem. The algorithm is as simple as solving an OLS problem, and with computer code to accompany their paper, the LASSO was set for its liftoff*.

The LASSO overall reduces model complexity. It does this by completely excluding some variables, using only a subset of the original potential explanatory variables. Since this can add to the story of the model, the reduction in complexity is a desired property. Clarity of authors’ exposition and well rehashed computer code are further reasons for the fully justified, full fledged LASSO flareup.

This is not a LASSO tutorial. Google-search results, undoubtedly refined over years of increased popularity, are clear enough by now. Also, if you are still reading this I imagine you already know what is the LASSO and how it works. To continue from this point, what follows is a selective list of milestones from the academic literature- some theoretical and practical extensions.

## Statistical Shrinkage

Shrinkage in statistics has increased in popularity over the decades. Now statistical shrinkage is commonplace, explicitly or implicitly.

But when is it that we need to make use of shrinkage? At least partly it depends on signal-to-noise ratio.