AI models are NOT biased

The issue of bias in AI has become a focal point in recent discussions, both in the academia and amongst practitioners and policymakers. I observe a lot of confusion and diffusion in those discussions. At the risk of seeming patronizing, my advice is to engage only with the understanding of the specific jargon which is used, and particularly how it’s used in this context. Misunderstandings create confusion and blur the path forward.

Here is a negative, yet typical example:

In artificial intelligence (AI)-based predictive models, bias – defined as unfair systematic error – is a growing source of concern1.

This post tries to direct those important discussions to the right avenues, providing some clarifications, examples for common pitfalls, and some qualified advice from experts in the field on how to approach this topic. If nothing else, I hope you find this piece thought-provoking.

Correlation and correlation structure (8) – the precision matrix

If you are reading this, you already know that the covariance matrix represents unconditional linear dependency between the variables. Far less mentioned is the bewitching fact that the elements of the inverse of the covariance matrix (i.e. the precision matrix) encode the conditional linear dependence between the variables. This post shows why that is the case. I start with the motivation to even discuss this, then the math, then some code.

Correlation and correlation structure (7) – Chatterjee’s rank correlation

Remarkably, considering that correlation modelling dates back to 1890, statisticians still make meaningful progress in this area. A recent step forward is given in A New Coefficient of Correlation by Sourav Chatterjee. I wrote about it shortly after it came out, and it has since garnered additional attention and follow-up results. The more I read about it, the more I am impressed with it. This post provides some additional details based on recent research.

On Writing

Each year I supervise several data-science master’s students, and each year I find myself repeating the same advises. Situation has worsen since students started (mis)using GPT models. I therefore have written this blog post to highlight few important, and often overlooked, aspects of thesis-writing. Many points apply also to writing in general.

Matrix Multiplication as a Linear Transformation

AI algorithms are in the air. The success of those algorithms is largely attributed to dimension expansions, which makes it important for us to consider that aspect.

Matrix multiplication can be beneficially perceived as a way to expand the dimension. We begin with a brief discussion on PCA. Since PCA is predominantly used for reducing dimensions, and since you are familiar with PCA already, it serves as a good springboard by way of a contrasting example for dimension expansion. Afterwards we show some basic algebra via code, and conclude with a citation that provides the intuition for the reason dimension expansion is so essential.

Most popular posts – 2023

Welcome 2024.

This blog is just a personal hobby. When I’m extra busy as I was this year the blog is a front-line casualty. This is why 2023 saw a weaker posting stream. Nonetheless I am pleased with just over 30K visits this year, with an average of roughly one minute per visit (engagement time, whatever google-analytics means by that). This year I only provide the top two posts (rather than the usual 3). Both posts have to do with statistical shrinkage:

Randomized Matrix Multiplication

Matrix multiplication is a fundamental computation in modern statistics. It’s at the heart of all concurrent serious AI applications. The size of the matrices nowadays is gigantic. On a good system it takes around 30 seconds to estimate the covariance of a data matrix with dimensions $X_{10000 \times 2500}$, a small data today’s standards mind you. Need to do it 10000 times? wait for roughly 80 hours. Have larger data? running time grows exponentially. Want a more complex operation than covariance estimate? forget it, or get ready to dig deep into your pockets.

We, mere minions who are unable to splurge thousands of dollars for high-end G/TPUs, are left unable to work with large matrices due to the massive computational requirements needed; because who wants to wait few weeks to discover their bug.

This post offers a solution by way of approximation, using randomization. I start with the idea, followed by a short proof, and conclude with some code and few run-time results.

Statistical Shrinkage (4) – Covariance estimation

A common issue encountered in modern statistics involves the inversion of a matrix. For example, when your data is sick with multicollinearity your estimates for the regression coefficient can bounce all over the place.

In finance we use the covariance matrix as an input for portfolio construction. Analogous to the fact that variance must be positive, covariance matrix must be positive definite to be meaningful. The focus of this post is on understanding the underlying issues with an unstable covariance matrix, identifying a practical solution for such an instability, and connecting that solution to the all-important concept of statistical shrinkage. I present a strong link between the following three concepts: regularization of the covariance matrix, ridge regression, and measurement error bias, with some easy-to-follow math.

Statistical Shrinkage (3)

Imagine you’re picking from 1,000 money managers. If you test just one, there’s a 5% chance you might wrongly think they’re great. But test 10, and your error chance jumps to 40%. To keep your error rate at 5%, you need to control the “family-wise error rate.” One method is to set higher standards for judging a manager’s talent, using a tougher t-statistic cut-off. Instead of the usual 5% cut (t-stat=1.65), you’d use a 0.5% cut (t-stat=2.58).

When testing 1,000 managers or strategies, the challenge increases. You’d need a manager with an extremely high t-stat of about 4 to stay within the 5% error rate. This big jump in the t-stat threshold helps keep the error rate in check. However that is discouragingly strict: a strategy which t-stat of 4 is rarity.

Rython tips and tricks – Clipboard

For whatever reason, clipboard functionalities from Rython are under-utilized. One utility function for reversing backslashes is found here. This post demonstrates how you can use the clipboard to circumvent saving and loading files. It’s convenient for when you just want the quick insight or visual, rather than a full-blown replicable process.

Statistical Shrinkage (2)

During 2017 I blogged about Statistical Shrinkage. At the end of that post I mentioned the important role signal-to-noise ratio (SNR) plays when it comes to the need for shrinkage. This post shares some recent related empirical results published in the Journal of Machine Learning Research from the paper Randomization as Regularization. While mainly for tree-based algorithms, the intuition undoubtedly extends to other numerical recipes also.

From Excel to Script? Add Browsing Points

“If it ain’t broke, don’t fix it”.

Rython tips and tricks – Snippets

R or Python? who cares! Which editor? now that’s a different story.

I like Rstudio for many reasons. Outside the personal, Rstudio allows you to write both R + Python = Rython in the same script. Apart from that, the editor’s level of complexity is well-balanced, not functionality-overkill like some, nor too simplistic like some others. This post shares how to save time with snippets (easy in Rstudio). Snippets save time by reducing the amount of typing required, it’s the most convenient way to program copy-pasting into the machine’s memory.

In addition to the useful built-ins snippets provided by Rstudio like lib or fun for R and imp or def for Python, you can write your own snippets. Below are a couple I wrote myself that you might find helpful. But first we start with how to use snippets.

Trees 1 – 0 Neural Networks

Tree-based methods like decision trees and their powerful random forest extensions are one of the most widely used machine learning algorithms. They are easy to use and provide good forecasting performance off the cuff more or less. Another machine learning community darling is the deep learning method, particularly neural networks. These are ultra flexible algorithms with impressive forecasting performance even (and especially) in highly complex real-life environments.

This post is shares:

• Two academic references lauding the powerful performance of tree-based methods.
• Because both neural networks and tree-based methods are able to capture non-linearity in the data, it’s not easy to choose between them. Those references help form an opinion with regards to when one should use neural networks and when tree-based methods are preferable, if you don’t have time to implement both (which is usually the case).

Most popular posts – 2022

Welcome 2023.

As per usual this point in time, I check my blog’s traffic-analytics to see which were the most popular pieces last year. Without further ado..