Kaggle Experience

At least in part, a typical data-scientist is busy with forecasting and prediction. Kaggle is a platform which hosts a slew of competitions. Those who have the time, energy and know-how to combat real-life problems, are huddling together to test their talent. I highly recommend this experience. A side effect of tackling actual problems (rather than those which appear in textbooks), is that most of the time you are not at all enjoying new wonderful insights or exploring fascinating unfamiliar, ground-breaking algos. Rather, you are handling\wrangling\manipulating data, which is.. ugly and boring, but necessary and useful.

I tried my powers few years ago, and again about 6 months ago in one of those competitions called Toxic Comment Classification Challenge. Here are my thoughts on that short experience and some insight from scraping the results of that competition.

Continue reading

Curse of dimensionality part 3: Higher-Order Comoments

Higher moments such as Skewness and Kurtosis are not as explored as they should be.

These moments are crucial for managing portfolio risk. At least as important as volatility, if not more. Skewness relates to asymmetry risk and Kurtosis relates to tail risk.

Despite their great importance, those higher moments enjoy only a small portion of attention compared with their lower more friendly moments: the mean and the variance. In my opinion, one reason for this may be the impossibility of estimating those moments, estimating them accurately that is.

It is yet another situation where Curse of Dimensonality rears its enchanting head (and an idea for a post is born..).

Continue reading

Understanding Kullback – Leibler Divergence

It is easy to measure distance between two points. But what about measuring distance between two distributions? Good question. Long answer. Welcome the Kullback – Leibler Divergence measure.

The motivation for thinking about the Kullback – Leibler Divergence measure is that you can pick up questions such as: “how different was the behavior of the stock market this year compared with the average behavior?”. This is a rather different question than the trivial “how was the return this year compared to the average return?”.

Continue reading

LASSO, LASSO, LASSO

LASSO stands for Least Absolute Shrinkage and Selection Operator. It was first introduced 21 years ago by Robert Tibshirani (Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society. Series B). In 2004 the four statistical masters: Efron, Hastie, Johnstone and Tibshirani joined together to write the paper Least angle regression published in the Annals of statistics. It is that paper that sent the LASSO to the podium. The reason? they removed a computational barrier. Armed with a new ingenious geometric interpretation, they presented an algorithm for solving the LASSO problem. The algorithm is as simple as solving an OLS problem, and with computer code to accompany their paper, the LASSO was set for its liftoff*.

The LASSO overall reduces model complexity. It does this by completely excluding some variables, using only a subset of the original potential explanatory variables. Since this can add to the story of the model, the reduction in complexity is a desired property. Clarity of authors’ exposition and well rehashed computer code are further reasons for the fully justified, full fledged LASSO flareup.

This is not a LASSO tutorial. Google-search results, undoubtedly refined over years of increased popularity, are clear enough by now. Also, if you are still reading this I imagine you already know what is the LASSO and how it works. To continue from this point, what follows is a selective list of milestones from the academic literature- some theoretical and practical extensions.

Continue reading

Density Estimation Using Regression

Density estimation using regression? Yes we can!

I like regression. It is one of those simple yet powerful statistical methods. You always know exactly what you are doing. This post is about density estimation, and how to get an estimate of the density using (Poisson) regression.

Continue reading

Computer Age Statistical Inference – now free

If you consider yourself Econometrician\Statistician or one of those numerous buzz word synonyms that are floating around these days, Computer Age Statistical Inference: Algorithms, Evidence and Data Science by Bradley Efron and Trevor Hastie is a book you can’t miss, and now nor should you. You can download the book for free.

My first inclination is to deliver an unequivocal recommendation. But in truth, my praises would probably fall short of what was already written.

So what can I give you? I can say that there are currently 6 amazon reviews, with a 4.5 average. One of the reviewers writes that there is some overlap with previous work. I agree. But it doesn’t matter. It reads so well, call it a refresher. Let’s face it, it is not as if you always have it so clear in your head such that you can afford to skip sections because you read something similar before.

I can also tell you why I think it is a special book.

Continue reading

Statistical Shrinkage

Shrinkage in statistics has increased in popularity over the decades. Now statistical shrinkage is commonplace, explicitly or implicitly.

But when is it that we need to make use of shrinkage? At least partly it depends on signal-to-noise ratio.

Continue reading

Understanding False Discovery Rate

False Discovery Rate is an unintuitive name for a very intuitive statistical concept. The math involved is as elegant as possible. Still, it is not an easy concept to actually understand. Hence i thought it would be a good idea to write this short tutorial.

We reviewed this important topic in the past, here as one of three Present-day great statistical discoveries, here in the context of backtesting trading strategies, and here in the context of scientific publishing. This post target the casual reader, explaining the concept of False Discovery Rate in plain words.

Continue reading

Understanding K-Means Clustering

Introduction

Google “K-means clustering”, and you usually you find ugly explanations and math-heavy sensational formulas*. It is my opinion that you can only understand those explanations if you don’t need them; meaning you are already familiar with the topic. Therefore, this is a more gentle introduction to K-means clustering. Here you will find out what K-Means Clustering, an algorithm, actually does. You will get only the basics, but in this particular topic, the extensions are not wildely different.

Continue reading

Outliers and Loss Functions

A few words about outliers

In statistics, outliers are as thorny topic as it gets. Is it legitimate to treat the observations seen during global financial crisis as outliers? or are those simply a feature of the system, and as such are integral part of a very fat tail distribution?

Continue reading

Density Confidence Interval

Density estimation belongs with the literature of non-parametric statistics. Using simple bootstrapping techniques we can obtain confidence intervals (CI) for the whole density curve. Here is a quick and easy way to obtain CI’s for different risk measures (VaR, expected shortfall) and using what follows, you can answer all kind of relevant questions.

Continue reading

Trim your mean

The mean is arguably the most commonly used measure for central tendency, no no, don’t fall asleep! important point ahead.

We routinely compute the average as an estimate for the mean. All else constant, how much return should we expect the S&P 500 to deliver over some period? the average of past returns is a good answer. The average is the Maximum Likelihood (ML) estimate under Gaussianity. The average is a private case of least square minimization (a regression with no explanatory variables). It is a good answer. BUT:

Continue reading

Optimism of the Training Error Rate

We all use models. We all continuously working to improve and validate our models. Constant effort is made trying to estimate: how good our model actually is?

A general term for this estimate is error rate. Low error rate is better than high error rate, it means our model is more accurate.

Continue reading