Convolutional Neural Networks (CNNs from here on) triumph in the field of image processing because they are designed to effectively handle strong spatial dependencies. Simply put, adjacent pixel-values are close to each other, often changing only gradually from one pixel to the next. In a picture where you wear a blue shirt, all the pixels in that area of the picture are blue. You can think of a strong autocorrelated time series, just for spatial data rather than sequential data. This post explains few important concepts related to CNNs: sparsity of connections, parameter sharing, and hierarchical feature engineering.

## R tips and tricks – get the gist

In scientific programming speed is important. Functions written for general public use have a lot of control-flow checks which are not necessary if you are confident enough with your code.To quicken your code execution I suggest to strip run-of-the-mill functions to their bare bones. You can save serious wall-clock time by using only the laborers code lines. Below is a walk-through example of what I mean.

## Correlation and Correlation Structure (6) – Distance Correlation

While linear correlation (aka Pearson correlation) is by far the most common type of dependence measure there are few arguably better ways to characterize\estimate the degree of dependence between variables. This is a fascinating topic I keep coming back to. There is so much for a typical geek to appreciate: non-linear dependencies, should we consider the noise in the data or rather just focus on the underlying process, should we consider the whole distribution or just few moments.

In this post number 6 on correlation and correlation structure I share another dependency measure called *“distance correlation”*. It has been around for a while now (2009, see references). I provide just the intuition, since the math has little to do with the way distance correlation is computed, but rather with the theoretical justification for its practical legitimacy.

## On Writing Math

There are a lot of examples for skills that despite being greatly needed, we never get any formal training for. At least nothing is built into our core educational programs. Few examples are: how to read well, how to listen well, or how to develop your can-do mental attitude. Writing well, in particular math-writing, is another such example. Here I share few pointers from my own experience of reading and writing math.

## R Packages Download Stats

One big advantage of using open-source tools is the fantastic ecosystems that typically accompany them. Being able to tap into a massive open-source community, by way of downloading freely available code is decidedly useful. But, yes, there are downsides to downloads.

For one, there are too many packages out there. There are imperfect duplicates. You can easily end up downloading inferior code/package/module compared to existing other. Second, there is a matter of security. I myself try to refrain from downloading relatively new code, not yet tried-and-true. How do we know if a package is solid?

## Similarity and Dissimilarity Metrics – Kernel Distance

In the field of unsupervised machine learning, similarity and dissimilarity metrics (and matrices) are part and parcel. These are core components of clustering algorithms or natural language processing summarization techniques, just to name a couple.

While at first glance distance metrics look like child’s play, the fact of the matter is that when you get down to business there are a lot of decisions to make, and who likes that? to make matters worse:

- Theoretical guidance is nowhere to be found
- Your choices and decisions matter, in the sense that results materially change

After reading this post you will understand concepts like distance metrics, (dis)similarity metrics, and see why it’s fashionable to use kernels as similarity metrics.

## Hyper-Parameter Optimization using Random Search

Hyper-parameters are parameters which are not estimated as an integral part of the model. We decide on those parameters but we don’t estimate them within, but rather beforehand. Therefore they are called hyper-parameters, as in “above” sense.

Almost all machine learning algorithms have some hyper-parameters. Data-driven choice of hyper-parameters means typically, that you re-estimate the model and check performance for different hyper-parameters’ configurations. This adds considerable computational burden. One popular approach to set hyper-parameters is based on a grid-search over possible values using the validation set. Faster and simpler ways to intelligently choose hyper-parameters’ values would go a long way in keeping the stretched computational cost at a level you can tolerate.

Enter the paper “Random Search for Hyper-Parameter Optimization” by James Bergstra and Yoshua Bengio, suggesting with a straight face not to use grid-search but instead, look for good values completely at random. This is very counterintuitive, for how can a random guesses within some region compete with systematically covering the same region? What’s the story there?

Below I share the message of that paper, along with what I personally believe is actually going on (and the two are very different).

## What is the Kernel Trick?

Every so often I read about the kernel trick. Each time I read about it I need to relearn what it is. Now I am thinking “Eran, don’t you have this fancy blog of yours where you write about statistics you don’t want to forget?” and then: “why indeed I do have a fancy blog where I write about statistics I don’t want to forget”. So in this post I explain the “trick” in kernel trick and why it is useful.

## Local Linear Forests

Random forests is one of the most powerful pure-prediction algorithms; immensely popular with modern statisticians. Despite the potent performance, improvements to the basic random forests algorithm are still possible. One such improvement is put forward in a recent paper called Local Linear Forests which I review in this post. To enjoy the read you need to be already familiar with the basic version of random forests.

## Most popular posts – 2021

Kind of sad, but the same intro which served last year, befits this year also.

Littered with Corona, this year was not easy. But looking around me, I feel grateful. The following quote by Socrates comes to mind:

“If all our misfortunes were laid in one common heap whence everyone must take an equal portion, most people would be content to take their own and depart.”

On topic, as with previous years I checked my website traffic-analytics. Without further ado here are the three most popular posts for 2021.

## Publication in Significance – code

Couple of months ago I published a paper in Significance – couple of pages describing the essence of deep learning algorithms, and why they are so popular. I got a few requests for the code which generated the figures in that paper. This weekend I reviewed my code and was content to see that I used a pseudorandom numbers, with a seed (as oppose to completely random numbers; without a seed). So now the figures are exactly reproducible. The actual code to produce the figures, and the figures themselves (e.g. for teaching purposes) are provided below.

## A New Parameterization of Correlation Matrices

In volatility modelling, a typical challenge is to keep the covariance matrix estimate valid, meaning (1) symmetric and (2) positive semi definite^{*}. A new paper published in *Econometrica* (citing from the paper) “introduces a novel parametrization of the correlation matrix. The reparametrization facilitates modeling of correlation and covariance matrices by an unrestricted vector, where **positive definiteness is an innate property**” (emphasis mine). *Econometrica* is known to publish ground-breaking research, and you may wonder: what is the big deal in being able to reparametrise the correlation matrix?

## What’s the big idea? Deep learning algorithms

Deep learning algorithms are increasingly featuring in popular news outlets, large-scale media events and academic conferences. But what makes them so popular? Why now?

I recently published what I hope is an easy read for all of you modern-statistics ~~geeks~~ lovers; explaining the thrust behind this machine-learning class of models.

You can download the two-pager from Significance, specifically here (subscription required).

## Bootstrap Standard Error Estimates – good news

More good news for the statistical bootstrap. A new paper in the prestigious Econometrica journal makes two interesting points.

## Asking Good Questions

Recently, I was lucky enough to speak at the 7th International conference on Time Series and Forecasting (ITISE). The conference itself had excellent collection of talks with a applications in completely different fields. Energy, neuroscience and, how can we not, a great deal of COVID19-related forecasting papers. It was a mix of online and in-person presentations, and with a slew of technical hiccups consuming a lot of valuable minutes time was of the essence. Very few minutes, if any, for questions. I attended my first conference well over a decade ago, and my strong feeling is that things have not changed much since. There is simply not enough training when it comes to the way slides should (and should not) look like, how to deliver a 20 minutes talk about a paper which took a year to draft, and indeed, which questions are good and which are just expensive folly.