In volatility modelling, a typical challenge is to keep the covariance matrix estimate valid, meaning (1) symmetric and (2) positive semi definite^{*}. A new paper published in *Econometrica* (citing from the paper) “introduces a novel parametrization of the correlation matrix. The reparametrization facilitates modeling of correlation and covariance matrices by an unrestricted vector, where **positive definiteness is an innate property**” (emphasis mine). *Econometrica* is known to publish ground-breaking research, and you may wonder: what is the big deal in being able to reparametrise the correlation matrix?

## What’s the big idea? Deep learning algorithms

Deep learning algorithms are increasingly featuring in popular news outlets, large-scale media events and academic conferences. But what makes them so popular? Why now?

I recently published what I hope is an easy read for all of you modern-statistics ~~geeks~~ lovers; explaining the thrust behind this machine-learning class of models.

You can download the two-pager from Significance, specifically here (subscription required).

## Bootstrap Standard Error Estimates – good news

More good news for the statistical bootstrap. A new paper in the prestigious Econometrica journal makes two interesting points.

## Asking Good Questions

Recently, I was lucky enough to speak at the 7th International conference on Time Series and Forecasting (ITISE). The conference itself had excellent collection of talks with a applications in completely different fields. Energy, neuroscience and, how can we not, a great deal of COVID19-related forecasting papers. It was a mix of online and in-person presentations, and with a slew of technical hiccups consuming a lot of valuable minutes time was of the essence. Very few minutes, if any, for questions. I attended my first conference well over a decade ago, and my strong feeling is that things have not changed much since. There is simply not enough training when it comes to the way slides should (and should not) look like, how to deliver a 20 minutes talk about a paper which took a year to draft, and indeed, which questions are good and which are just expensive folly.

## R tips and tricks – shell.exec

When you startup your machine, the first thing you do is to open the various programs you work with. Examples: your note-taking program, the pdf file that you need to read, the ppt file you were last working on, and of course your strongest link with the outside world nowadays; your email box. This post shows how to automate this process. Windows machines notoriously need restarting for every little (un)install. I trust you will find this startup automation advice handy.

## Bayesian vs. Frequentist in Practice, part 3

This post is inspired by Leo Breiman’s opinion piece “No Bayesians in foxholes”. The saying “there are no atheists in foxholes” refers to the fact that if you are in the foxhole (being bombarded..), you pray! Leo’s paraphrase indicates that when complex, real problems are present, there are no Bayesian to be found.

## Random forest importance measures are NOT important

Random Forests (RF from here onwards) is a widely used pure-prediction algorithm. This post assumes good familiarity with RF. If you are not familiar with this algorithm, stop here and see the first reference below for an easy tutorial. If you used RF before and you are familiar with it, then you probably encountered those “importance of the variables” plots. We start with a brief explanation of those plots, and the concept of *importance scores* calculation. Main takeaway from the post: don’t use those *importance scores* plots, because they are simply misleading. Those importance plots are simply a wrong turn taken by our human tendency to look for reason, whether it’s there or it’s not there.

## R tips and tricks – readClipboard

Here is a small utility function to save you some boring work.

Say you have a file to read into R. The file path is `C:\Users\folder1\folder2\folder3\mydata.csv`

. So what do you do? you copy the path, paste it to the editor, and start reversing the backslash into a forward slash so that R can read your file.

With the help of the `rstudioapi`

package, the `readClipboard`

function and the following function:

1 2 3 4 5 6 7 |
get_path <- function(){ x <- readClipboard(raw= F) rstudioapi::insertText( paste("#",x, "\n") ) x } |

You can

1. Simply copy the path `C:\Users\folder1\folder2\folder3\mydata.csv`

2. execute `pathh <- get_path()`

3. use `pathh`

which is now R-ready.

No more reversing or escaping backslash.

## Beta in the tails

Every form of strength is also a form of weakness^{*}. I love statistics, but I focus to much on methodology, which is not for everyone. Some people (right or wrong) question: “wonderful sir, but what can I do with it?”.

A new paper titled *“Beta in the tails”* is a showcase application for why we should focus on correlation structure rather than on average correlation. They discuss the question: *Do hedge funds hedge?* The reply: No, they don’t!

The paper *“Beta in the tails”* was published in the *Journal of Econometrics* but you can find a link to a working paper version below. We start with a figure replicated from the paper, go through the meaning and interpretation of it, and explain the methods used thereafter.

## How flexible neural networks really are?

Very!

A distinctive power of neural networks (neural nets from here on) is their ability to flex themselves in order to capture complex underlying data structure. This post shows that the expressive power of neural networks can be quite swiftly taken to the extreme, in a bad way.

What does it mean? A paper from 1989 (universal approximation theorem, reference below) shows that any reasonable function can be approximated arbitrarily well by fairly a shallow neural net.

Speaking freely, if one wants to abuse the data, to overfit it like there is no tomorrow, then neural nets is the way to go; with neural nets you can perfectly map your fitted values to any data shape. Let’s code an example and explain the meaning of this.

## Correlation and correlation structure (5) – a new coefficient of correlation

This is the fifth post which is concerned with quantifying the dependence between variables. When talking correlations one usually thinks about linear correlation, aka Pearson’s correlation. One serious limitation of linear correlation is that it’s, well.. linear. By construction it’s not useful for detecting non-monotonic relation between variables. Here I share some recent academic research, a new way to detect associations that are **not** monotonic.

## Understanding Variance Explained in PCA – Matrix Approximation

Principal component analysis (PCA from here on) is performed via linear algebra functions called eigen decomposition or singular value decomposition. Since you are actually reading this, you may well have used PCA in the past, at school or where you work. There is a strong link between PCA and the usual least squares regression (previous posts here and here). More recently I explained what does variance explained by the first principal component actually means.

This post offers a matrix approximation perspective. As a by-product, we also show how to compare two matrices, to see how different they are from each other. Matrix approximation is a bit math-hairy, but we keep it simple here I promise. For this fascinating field itself I suspect a rise in importance. We are constantly stretching what we can do computationally, and by using approximations rather than the actual data, we can ease that burden. The price for using approximation is decrease in accuracy (à la “garbage in garbage out”), but with good approximation the tradeoff between the accuracy and computational time is favorable.

## R tips and tricks – Timing and profiling code

Modern statistical methods use simulations; generating different scenarios and repeating those thousands of times over. Therefore, even trivial operations burden computational speed.

In the words of my favorite statistician Bradley Efron:

“There is some sort of law working here, whereby statistical methodology always expands to strain the current limits of computation.”

In addition to the need for faster computation, the richness of open-source ecosystem means that you often encounter different functions doing the same thing, sometimes even under the same name. This post explains how to measure the computational efficacy of a function so you know which one to use, with a couple of actual examples for reducing computational time.

## Most popular posts – 2020

Littered with Corona, this year was not easy. But looking around me, I feel grateful. The following quote by Socrates comes to mind:

“If all our misfortunes were laid in one common heap whence everyone must take an equal portion, most people would be content to take their own and depart.”

On topic, as with previous years I checked my analytics so as to let you know which posts got the most attention. Without further ado here are the three most popular posts for this year.

## Why complex models are data-hungry?

If you regularly read this blog then you know I am not one to jump on the “AI Bandwagon”, being quickly weary of anyone flashing the “It’s Artificial Intelligence” joker card. Don’t get me wrong, I understand it is a sexy term I, but to me it always feels a bit like a sales pitch.

If the machine does anything (artificially) intelligent it means that the model at the back is complex, and complex models need massive (**massive** I say) amounts of data. This is because of the infamous Curse of dimensionality.

I know it. You know it. Complex models need a lot of data. You have read this fact, even wrote it at some point. But why is it the case? “So we get a good estimate of the parameter, and a good forecast thereafter”, you reply. I accept. But.. what is it about simple models that they could suffice themselves with much less data compared to complex models? Why do I always recommend to start simple? and why the literature around shrinkage and overfitting is as prolific as it is?