Matrix Multiplication as a Linear Transformation

AI algorithms are in the air. The success of those algorithms is largely attributed to dimension expansions, which makes it important for us to consider that aspect.

Matrix multiplication can be beneficially perceived as a way to expand the dimension. We begin with a brief discussion on PCA. Since PCA is predominantly used for reducing dimensions, and since you are familiar with PCA already, it serves as a good springboard by way of a contrasting example for dimension expansion. Afterwards we show some basic algebra via code, and conclude with a citation that provides the intuition for the reason dimension expansion is so essential.

Continue reading

Statistical Shrinkage (3)

Imagine you’re picking from 1,000 money managers. If you test just one, there’s a 5% chance you might wrongly think they’re great. But test 10, and your error chance jumps to 40%. To keep your error rate at 5%, you need to control the “family-wise error rate.” One method is to set higher standards for judging a manager’s talent, using a tougher t-statistic cut-off. Instead of the usual 5% cut (t-stat=1.65), you’d use a 0.5% cut (t-stat=2.58).

When testing 1,000 managers or strategies, the challenge increases. You’d need a manager with an extremely high t-stat of about 4 to stay within the 5% error rate. This big jump in the t-stat threshold helps keep the error rate in check. However that is discouragingly strict: a strategy which t-stat of 4 is rarity.

Continue reading

Statistical Shrinkage (2)

During 2017 I blogged about Statistical Shrinkage. At the end of that post I mentioned the important role signal-to-noise ratio (SNR) plays when it comes to the need for shrinkage. This post shares some recent related empirical results published in the Journal of Machine Learning Research from the paper Randomization as Regularization. While mainly for tree-based algorithms, the intuition undoubtedly extends to other numerical recipes also.

Continue reading

Trees 1 – 0 Neural Networks

Tree-based methods like decision trees and their powerful random forest extensions are one of the most widely used machine learning algorithms. They are easy to use and provide good forecasting performance off the cuff more or less. Another machine learning community darling is the deep learning method, particularly neural networks. These are ultra flexible algorithms with impressive forecasting performance even (and especially) in highly complex real-life environments.

This post is shares:

  • Two academic references lauding the powerful performance of tree-based methods.
  • Because both neural networks and tree-based methods are able to capture non-linearity in the data, it’s not easy to choose between them. Those references help form an opinion with regards to when one should use neural networks and when tree-based methods are preferable, if you don’t have time to implement both (which is usually the case).
  • Continue reading

    Beware of Spurious Factors

    The word spurious refers to “outwardly similar or corresponding to something without having its genuine qualities.” Fake.

    While the meanings of spurious correlation and spurious regression are common knowledge nowadays, much less is understood about spurious factors. This post draws your attention to recent, top-shelf, research flagging the risks around spurious factor analysis. While formal solutions are still pending there are couple of heuristics we can use to detect possible problems.

    Continue reading

    Hyper-Parameter Optimization using Random Search

    Hyper-parameters are parameters which are not estimated as an integral part of the model. We decide on those parameters but we don’t estimate them within, but rather beforehand. Therefore they are called hyper-parameters, as in “above” sense.

    Almost all machine learning algorithms have some hyper-parameters. Data-driven choice of hyper-parameters means typically, that you re-estimate the model and check performance for different hyper-parameters’ configurations. This adds considerable computational burden. One popular approach to set hyper-parameters is based on a grid-search over possible values using the validation set. Faster and simpler ways to intelligently choose hyper-parameters’ values would go a long way in keeping the stretched computational cost at a level you can tolerate.

    Enter the paper “Random Search for Hyper-Parameter Optimization” by James Bergstra and Yoshua Bengio, suggesting with a straight face not to use grid-search but instead, look for good values completely at random. This is very counterintuitive, for how can a random guesses within some region compete with systematically covering the same region? What’s the story there?

    Below I share the message of that paper, along with what I personally believe is actually going on (and the two are very different).

    Continue reading

    Local Linear Forests

    Random forests is one of the most powerful pure-prediction algorithms; immensely popular with modern statisticians. Despite the potent performance, improvements to the basic random forests algorithm are still possible. One such improvement is put forward in a recent paper called Local Linear Forests which I review in this post. To enjoy the read you need to be already familiar with the basic version of random forests.

    Continue reading

    Publication in Significance – code

    Couple of months ago I published a paper in Significance – couple of pages describing the essence of deep learning algorithms, and why they are so popular. I got a few requests for the code which generated the figures in that paper. This weekend I reviewed my code and was content to see that I used a pseudorandom numbers, with a seed (as oppose to completely random numbers; without a seed). So now the figures are exactly reproducible. The actual code to produce the figures, and the figures themselves (e.g. for teaching purposes) are provided below.

    Continue reading

    A New Parameterization of Correlation Matrices

    In volatility modelling, a typical challenge is to keep the covariance matrix estimate valid, meaning (1) symmetric and (2) positive semi definite*. A new paper published in Econometrica (citing from the paper) “introduces a novel parametrization of the correlation matrix. The reparametrization facilitates modeling of correlation and covariance matrices by an unrestricted vector, where positive definiteness is an innate property” (emphasis mine). Econometrica is known to publish ground-breaking research, and you may wonder: what is the big deal in being able to reparametrise the correlation matrix?

    Continue reading

    What’s the big idea? Deep learning algorithms

    Deep learning algorithms are increasingly featuring in popular news outlets, large-scale media events and academic conferences. But what makes them so popular? Why now?

    I recently published what I hope is an easy read for all of you modern-statistics geeks lovers; explaining the thrust behind this machine-learning class of models.

    You can download the two-pager from Significance, specifically here (subscription required).

    Continue reading

    Beta in the tails

    Every form of strength is also a form of weakness*. I love statistics, but I focus to much on methodology, which is not for everyone. Some people (right or wrong) question: “wonderful sir, but what can I do with it?”.

    A new paper titled “Beta in the tails” is a showcase application for why we should focus on correlation structure rather than on average correlation. They discuss the question: Do hedge funds hedge? The reply: No, they don’t!

    The paper “Beta in the tails” was published in the Journal of Econometrics but you can find a link to a working paper version below. We start with a figure replicated from the paper, go through the meaning and interpretation of it, and explain the methods used thereafter.

    Continue reading

    How flexible neural networks really are?


    A distinctive power of neural networks (neural nets from here on) is their ability to flex themselves in order to capture complex underlying data structure. This post shows that the expressive power of neural networks can be quite swiftly taken to the extreme, in a bad way.

    What does it mean? A paper from 1989 (universal approximation theorem, reference below) shows that any reasonable function can be approximated arbitrarily well by fairly a shallow neural net.

    Speaking freely, if one wants to abuse the data, to overfit it like there is no tomorrow, then neural nets is the way to go; with neural nets you can perfectly map your fitted values to any data shape. Let’s code an example and explain the meaning of this.

    Continue reading

    Correlation and correlation structure (5) – a new coefficient of correlation

    This is the fifth post which is concerned with quantifying the dependence between variables. When talking correlations one usually thinks about linear correlation, aka Pearson’s correlation. One serious limitation of linear correlation is that it’s, well.. linear. By construction it’s not useful for detecting non-monotonic relation between variables. Here I share some recent academic research, a new way to detect associations that are not monotonic.

    Continue reading

    Correlation and correlation structure (4) – asymmetric correlations of equity portfolios

    Here I share a refreshing idea from the paper “Asymmetric correlations of equity portfolios” which was published in the Journal of financial Economics, a top tier journal in this field. The question is how much the observed conditional correlation on the downside (say) differs from the conditional correlation you would expect from a symmetrical distribution. You can find here an explanation for the H-statistic developed in the aforementioned paper and some code for illustration.

    Continue reading