Machine learning is simply statistics

Another opinion piece.

If you can’t explain it simply you don’t understand it well enough.
(Albert Einstein)

A bit on Deep Learning

What is so deep about deep learning? Nothing. There is nothing deep about it. If you read through the excellent Deep Learning book you can see (p. 167 in my copy) that a deep learning model with say three layers, omitting dependency on parameters, could be written as

$f(\boldsymbol{x})=f^{(3)}\left(f^{(2)}\left(f^{(1)}(\boldsymbol{x})\right)\right).$

In words, the whole shabang boils down to a highly non-linear transformation of the original variables. The word “deep” is not bad in that it provides a feel for the kind of numerical procedures needed for those models. We don’t have a better word, but it’s just a convention used to describe the number of these “chain structures”. It does not carry any real meaning otherwise. So what if deep learning models are highly non-linear, and so what if we apply fanciful optimization methods on the way. Put differently, deep learning models are just and simply a sub-class of the “usual” non-parametric statistics. This statement does not up- or downgrade these class of models. Just to drive the point that machine learning is simply statistics.

In the same Deep Learning book, after you are done reading the pleasantly thorough Machine Learning Basics chapter, which example do you think is first in line? Linear regression! not different from the 1805 Legendre’s method of least squares.

Do you think you don’t understand what convolution is? Have you ever applied a moving average to a time series? that is a one-dimensional convolution. You never explained it by telling that you have convolved your time serie with a box-shaped function, did you? No, you said you used a moving average. Again, I don’t mind the language we use, but whichever way you look, you can always spot just mainstream straightforward statistics, camouflaged in different terms pulled probably from computer science or alike.

Vindication, in a way..

Victor Chernozhukov is one of those world-class econometrician I try to follow. In a recent talk he gave, min 1:40 in this youtube link he mentions the “new generation of non-parametric statistical methods, branded as ‘machine learning’ “. In around minute 10 of the same video, he makes a joke about the Frisch–Waugh–Lovell, stating those were machine learning researchers who worked back in 1930.

So that was nice for me to see I am not alone with this somewhat less popular view.