Theory - Eran Raviv

Understanding Word Embeddings (2) – Geometry

Blog, Statistics and EconometricsPosted on 06/02/2025

I have noticed that when I use the term “coordinates” to talk about vectors, it doesn’t always click for everyone in the room. The previous post covered the algebra of word embeddings and now we explain why you should think about the vector of the word embedding simply as coordinates in space. We skip the trivial 1D and 2D cases since they are straightforward. 4 dimensions is too complicated for me to gif around with, so 3D dimensions would have to suffice for our illustrations.

Understanding Word Embeddings (1) – Algebra

Blog, Statistics and EconometricsPosted on 05/07/2025

Some time back I took the time to explain that matrix multiplication can be viewed as a linear transformation. Having that perspective helps to grasp the inner-workings of all AI models across various domains (audio, images, etc.). Building on that, these next couple of posts will help you understand the inputs used in these matrix multiplication operations, specifically for those who want to understand how text-based models and LLMs function. Our focus is on the infamous one-hot encoding, as it is the key to unlock the underlying theory. It will provide you, I hope, the often-illusive intuition behind word-embeddings.

Understanding Spectral Clustering

Blog, Statistics and EconometricsPosted on 09/20/2020

Some problems are linear, but some problems are non-linear. I presume that you started your education discussing and solving linear problems which is a natural starting point. For non-linear problems solutions often involve an initial processing step. The aim of that initial step is to transform the problem such that it has, again, linear flavor.

A textbook example is the logistic regression, a tried-and-true recipe for getting the best linear boundary between two classes. In a standard neural network model, you will find logistic regression (or multinomial regression for multi-class output) applied on transformed data. Few preceding layers are “devoted” to transform a non-separable input-space into something which linear methods could handle, allowing the logistic regression to solve the problem with relative ease.

The same rationale holds for spectral clustering. Rather than working with the original inputs, work first with a transformed data which would make it easier to solve, and then link back to your original inputs.

Spectral clustering is an important and up-and-coming variant of some fairly standard clustering algorithms. It is a powerful tool to have in your modern statistics tool cabinet. Spectral clustering includes a processing step to help solve non-linear problems, such that they could be solved with those linear algorithms we are so fond of. For example, the undeniably popular K-means.

Laws of large numbers

Blog, Statistics and EconometricsPosted on 05/30/2016

The laws of large numbers are the cornerstones of asymptotic theory. ‘Large numbers’ in this context does not refer to the value of the numbers we are dealing with, rather, it refers to a large number of repetitions (or trials, or experiments, or iterations). This post takes a stab at explaining the difference between the strong law of large numbers (SLLN) and the weak law of large numbers (WLLN). I think it is important, not amply clear to most, and I will need it as a reference in future posts.