Statistics and Econometrics

Correlation and correlation structure (10) – Inverse Covariance

Blog, Finance and Trading, Statistics and EconometricsPosted on 09/29/2025

The covariance matrix is central to many statistical methods. It tells us how variables move together, and its diagonal entries – variances – are very much our go-to measure of uncertainty. But the real action lives in its inverse. We call the inverse covariance matrix either the precision matrix or the concentration matrix. Where did these terms come from? I’ll now explain the origin of these terms and why the inverse of the covariance is named that way. I doubt this has kept you up at night, but I still think you’ll find it interesting.

Dot Product in the Attention Mechanism

Blog, Statistics and EconometricsPosted on 07/28/2025

The dot product of two embedding vectors $\mathbf{x}$ and $\mathbf{y}$ with dimension $d$ is defined as

$\mathbf{x} \cdot \mathbf{y} = x_1y_1 + x_2y_2 + \dots + x_dy_d.$

Hardly the first thing that jumps to mind when thinking about a “similarity score”. Indeed, the result of a dot product is a single numbers (a scalar), with no predefined range (e.g. not between zero and one). So, it’s hard to quantify whether a particular score is high/low on its own. Still, deep learning Transformer family of models rely heavily on the dot product in the attention mechanism; to weigh the importance of different parts of the input sentence. This post explains why the dot product which seems like an odd pick as a “similarity scores”, actually makes good sense.

Understanding Word Embeddings (2) – Geometry

Blog, Statistics and EconometricsPosted on 06/02/2025

I have noticed that when I use the term “coordinates” to talk about vectors, it doesn’t always click for everyone in the room. The previous post covered the algebra of word embeddings and now we explain why you should think about the vector of the word embedding simply as coordinates in space. We skip the trivial 1D and 2D cases since they are straightforward. 4 dimensions is too complicated for me to gif around with, so 3D dimensions would have to suffice for our illustrations.

Understanding Word Embeddings (1) – Algebra

Blog, Statistics and EconometricsPosted on 05/07/2025

Some time back I took the time to explain that matrix multiplication can be viewed as a linear transformation. Having that perspective helps to grasp the inner-workings of all AI models across various domains (audio, images, etc.). Building on that, these next couple of posts will help you understand the inputs used in these matrix multiplication operations, specifically for those who want to understand how text-based models and LLMs function. Our focus is on the infamous one-hot encoding, as it is the key to unlock the underlying theory. It will provide you, I hope, the often-illusive intuition behind word-embeddings.

Correlation and correlation structure (9) – Parallelizing Matrix Computation

Blog, Code, Finance and Trading, Statistics and EconometricsPosted on 04/13/2025

Datasets have grown from large to massive, and so we increasingly find ourselves refactoring for readability and prioritizing computational efficiency (speed). The computing time for the ever-important sample covariance estimate of a dataset $X \in \mathbb{R}^{n \times p}$, with $n$ observations and $p$ variables is $\mathcal{O}(n p^2)$. Although a single covariance calculation for today’s large datasets is manageable still, it’s computationally prohibitive to use bootstrap, or related resampling methods that require very many repetitions where each repetition demands its own covariance computation. Without fast computation bootstrap remains impractical for high-dimensional problems. And that, we undoubtedly all agree is a tragedy.

So, what can we do restore resampling methods to our toolkit? We can reduce computing times, and appreciably so, if we compute in parallel. We can reduce waiting times from overnight to matters of minutes or seconds even. Related to this, I wrote a post about Randomized Matrix Multiplication where I offer computationally cheaper approximation instead of the exact, but longer to compute procedure.

This post you now read was inspired by a question from Laura Balzano (University of Michigan) who asked if we can’t get an exact solution (rather than an approximation) using parallel computing shown in that other post. I spent some time thinking about it and indeed it’s possible, and valuable. So with that context out of the way, here is the Rython (R + Python) code to calculate the sample covariance estimate in parallel, with some indication for time saved. Use it when you have large matrices and you need the sample covariance matrix or derivative thereof.

Nonstandard errors?

Blog, Miscellaneous, Statistics and EconometricsPosted on 02/14/2025

Nonstandard errors is the title given to a recent published paper in the prestigious Journal of Finance by more than 350 authors. At first glance the paper appears to mix apples and oranges. At second glance, it still looks that way. To be clear, the paper is mostly what you expect from a top journal: stimulating, thought-provoking and impressive. However, my main reservation is with its conclusion and recommendations which are off the mark, I think.

AI models are NOT biased

Blog, Statistics and EconometricsPosted on 06/11/2024

The issue of bias in AI has become a focal point in recent discussions, both in the academia and amongst practitioners and policymakers. I observe a lot of confusion and diffusion in those discussions. At the risk of seeming patronizing, my advice is to engage only with the understanding of the specific jargon which is used, and particularly how it’s used in this context. Misunderstandings create confusion and blur the path forward.

Here is a negative, yet typical example:

In artificial intelligence (AI)-based predictive models, bias – defined as unfair systematic error – is a growing source of concern¹.

This post tries to direct those important discussions to the right avenues, providing some clarifications, examples for common pitfalls, and some qualified advice from experts in the field on how to approach this topic. If nothing else, I hope you find this piece thought-provoking.

Correlation and correlation structure (8) – the precision matrix

Blog, Finance and Trading, Statistics and EconometricsPosted on 06/06/2024

If you are reading this, you already know that the covariance matrix represents unconditional linear dependency between the variables. Far less mentioned is the bewitching fact that the elements of the inverse of the covariance matrix (i.e. the precision matrix) encode the conditional linear dependence between the variables. This post shows why that is the case. I start with the motivation to even discuss this, then the math, then some code.

Correlation and correlation structure (7) – Chatterjee’s rank correlation

Blog, Statistics and EconometricsPosted on 03/20/2024

Remarkably, considering that correlation modelling dates back to 1890, statisticians still make meaningful progress in this area. A recent step forward is given in A New Coefficient of Correlation by Sourav Chatterjee. I wrote about it shortly after it came out, and it has since garnered additional attention and follow-up results. The more I read about it, the more I am impressed with it. This post provides some additional details based on recent research.

Matrix Multiplication as a Linear Transformation

Blog, Statistics and EconometricsPosted on 01/17/2024

AI algorithms are in the air. The success of those algorithms is largely attributed to dimension expansions, which makes it important for us to consider that aspect.

Matrix multiplication can be beneficially perceived as a way to expand the dimension. We begin with a brief discussion on PCA. Since PCA is predominantly used for reducing dimensions, and since you are familiar with PCA already, it serves as a good springboard by way of a contrasting example for dimension expansion. Afterwards we show some basic algebra via code, and conclude with a citation that provides the intuition for the reason dimension expansion is so essential.

Most popular posts – 2023

Blog, R, Statistics and EconometricsPosted on 01/02/2024

Welcome 2024.

This blog is just a personal hobby. When I’m extra busy as I was this year the blog is a front-line casualty. This is why 2023 saw a weaker posting stream. Nonetheless I am pleased with just over 30K visits this year, with an average of roughly one minute per visit (engagement time, whatever google-analytics means by that). This year I only provide the top two posts (rather than the usual 3). Both posts have to do with statistical shrinkage:

Randomized Matrix Multiplication

Blog, Finance and Trading, Statistics and EconometricsPosted on 12/14/2023

Matrix multiplication is a fundamental computation in modern statistics. It’s at the heart of all concurrent serious AI applications. The size of the matrices nowadays is gigantic. On a good system it takes around 30 seconds to estimate the covariance of a data matrix with dimensions $X_{10000 \times 2500}$, a small data today’s standards mind you. Need to do it 10000 times? wait for roughly 80 hours. Have larger data? running time grows exponentially. Want a more complex operation than covariance estimate? forget it, or get ready to dig deep into your pockets.

We, mere minions who are unable to splurge thousands of dollars for high-end G/TPUs, are left unable to work with large matrices due to the massive computational requirements needed; because who wants to wait few weeks to discover their bug.

This post offers a solution by way of approximation, using randomization. I start with the idea, followed by a short proof, and conclude with some code and few run-time results.

Statistical Shrinkage (4) – Covariance estimation

Blog, Finance and Trading, Statistics and EconometricsPosted on 11/29/2023

A common issue encountered in modern statistics involves the inversion of a matrix. For example, when your data is sick with multicollinearity your estimates for the regression coefficient can bounce all over the place.

In finance we use the covariance matrix as an input for portfolio construction. Analogous to the fact that variance must be positive, covariance matrix must be positive definite to be meaningful. The focus of this post is on understanding the underlying issues with an unstable covariance matrix, identifying a practical solution for such an instability, and connecting that solution to the all-important concept of statistical shrinkage. I present a strong link between the following three concepts: regularization of the covariance matrix, ridge regression, and measurement error bias, with some easy-to-follow math.

Statistical Shrinkage (3)

Blog, R, Statistics and EconometricsPosted on 11/18/2023

Imagine you’re picking from 1,000 money managers. If you test just one, there’s a 5% chance you might wrongly think they’re great. But test 10, and your error chance jumps to 40%. To keep your error rate at 5%, you need to control the “family-wise error rate.” One method is to set higher standards for judging a manager’s talent, using a tougher t-statistic cut-off. Instead of the usual 5% cut (t-stat=1.65), you’d use a 0.5% cut (t-stat=2.58).

When testing 1,000 managers or strategies, the challenge increases. You’d need a manager with an extremely high t-stat of about 4 to stay within the 5% error rate. This big jump in the t-stat threshold helps keep the error rate in check. However that is discouragingly strict: a strategy which t-stat of 4 is rarity.

Statistical Shrinkage (2)

Blog, Statistics and EconometricsPosted on 08/06/2023

During 2017 I blogged about Statistical Shrinkage. At the end of that post I mentioned the important role signal-to-noise ratio (SNR) plays when it comes to the need for shrinkage. This post shares some recent related empirical results published in the Journal of Machine Learning Research from the paper Randomization as Regularization. While mainly for tree-based algorithms, the intuition undoubtedly extends to other numerical recipes also.