Blog - 8/16 - Eran Raviv

Understanding False Discovery Rate

Blog, Statistics and EconometricsPosted on 04/04/2017

False Discovery Rate is an unintuitive name for a very intuitive statistical concept. The math involved is as elegant as possible. Still, it is not an easy concept to actually understand. Hence i thought it would be a good idea to write this short tutorial.

We reviewed this important topic in the past, here as one of three Present-day great statistical discoveries, here in the context of backtesting trading strategies, and here in the context of scientific publishing. This post target the casual reader, explaining the concept of False Discovery Rate in plain words.

Understanding K-Means Clustering

Blog, Statistics and EconometricsPosted on 03/12/2017

Introduction

Google “K-means clustering”, and you usually you find ugly explanations and math-heavy sensational formulas*. It is my opinion that you can only understand those explanations if you don’t need them; meaning you are already familiar with the topic. Therefore, this is a more gentle introduction to K-means clustering. Here you will find out what K-Means Clustering, an algorithm, actually does. You will get only the basics, but in this particular topic, the extensions are not wildely different.

R tips and tricks – the locator function

Blog, MiscTips, RPosted on 03/08/2017

How many times have you placed the legend in R plot to discover it is being overrun by some points or lines in the chart? Usually what comes next is a trial-and-error phase where you adjust the location, changing the arguments of the x and y coordinates, and re-drawing the plot again to check if the legend or text are now positioned such that they are fully readable.

Outliers and Loss Functions

Blog, Risk, Statistics and EconometricsPosted on 02/19/2017

A few words about outliers

In statistics, outliers are as thorny topic as it gets. Is it legitimate to treat the observations seen during global financial crisis as outliers? or are those simply a feature of the system, and as such are integral part of a very fat tail distribution?

R tips and tricks – Set Working Directory

Blog, Code, RPosted on 02/08/2017

This is more an Rstudio tip than an R tip. It would be nice to know how the following works for different editors, but Rstudio is common enough and awesome enough for the following to be relevant.

Density Confidence Interval

Blog, Finance and Trading, Risk, Statistics and EconometricsPosted on 01/25/2017

Density estimation belongs with the literature of non-parametric statistics. Using simple bootstrapping techniques we can obtain confidence intervals (CI) for the whole density curve. Here is a quick and easy way to obtain CI’s for different risk measures (VaR, expected shortfall) and using what follows, you can answer all kind of relevant questions.

Most popular posts – 2016

BlogPosted on 12/28/2016

Another year. Looking at my google analytics reports I can’t help but wonder how is it that I am so bad in predicting which posts would catch audience attention. Anyhow, top three for 2016 are:

– On the 60/40 portfolio mix
– The case for Regime-Switching GARCH
– Most popular machine learning R packages

And my personal favorites:
– ASA statement on p-values
– Why bad trading strategies may perform well? Mathematical explanation

It is also an opportunity to say thank you, and to wish you a happy and productive 2017.

Trim your mean

Blog, Risk, Statistics and EconometricsPosted on 12/18/2016

The mean is arguably the most commonly used measure for central tendency, no no, don’t fall asleep! important point ahead.

We routinely compute the average as an estimate for the mean. All else constant, how much return should we expect the S&P 500 to deliver over some period? the average of past returns is a good answer. The average is the Maximum Likelihood (ML) estimate under Gaussianity. The average is a private case of least square minimization (a regression with no explanatory variables). It is a good answer. BUT:

R tips and tricks – Faster Loops

Blog, Code, RPosted on 12/12/2016

Insert or bind?

This is the first in a series of planned posts, sharing some R tips and tricks. I hope to cover topics which are not easily found elsewhere. This post has to do with loops in R. There are two ways to save values when looping:
1. You can predefine a vector and fill it, or
2. you can recursively bind the values.

Which one is faster?

Optimism of the Training Error Rate

Blog, Statistics and EconometricsPosted on 12/04/2016

We all use models. We all continuously working to improve and validate our models. Constant effort is made trying to estimate: how good our model actually is?

A general term for this estimate is error rate. Low error rate is better than high error rate, it means our model is more accurate.

On Central Moments

Blog, Risk, Statistics and EconometricsPosted on 11/14/2016

Sometimes I read academic literature, and often times those papers contain some proofs. I usually gloss over some innocent-looking assumptions on moments’ existence, invariably popping before derivations of theorems or lemmas. Here is one among countless examples, actually taken from Making and Evaluating Point Forecasts:

Modeling Tail Behavior with EVT

Blog, Risk, Statistics and EconometricsPosted on 09/19/2016

Extreme Value Theory (EVT) and Heavy tails

Extreme Value Theory (EVT) is busy with understanding the behavior of the distribution, in the extremes. The extreme determine the average, not the reverse. If you understand the extreme, the average follows. But, getting the extreme right is extremely difficult. By construction, you have very few data points. By way of contradiction, if you have many data points then it is not the extreme you are dealing with.

Good coding practices – part 2

Blog, CodePosted on 09/12/2016

Introduction

In part 1 of Good coding practices we considered how best to code for someone else, may it be a colleague who is coming from Excel environment and is unfamiliar with scripting, a collaborator, a client or the future-you, the you few months from now. In this second part, I give some of my thoughts on how best to write functions, the do’s and dont’s.

Multivariate Volatility Forecast Evaluation

Blog, Finance and Trading, Risk, Statistics and EconometricsPosted on 09/01/2016

The evaluation of volatility models is gracefully complicated by the fact that, unlike other time series, even the realization is not observable. Two researchers would never disagree about what was yesterday’s stock price, but they can easily disagree about what was yesterday’s stock volatility. Because we don’t observe volatility directly, each of us uses own proxy of choice. There are many ways to skin this cat (more on volatility proxy here).

In a previous post Univariate volatility forecast evaluation we considered common ways in which we can evaluate how good is our volatility model, dealing with one time-series at a time. But how do we evaluate, or compare two models in a multivariate settings, with two covariance matrices?

Why bad trading strategies may perform well? Mathematical explanation

Blog, Finance and Trading, Risk, Statistics and EconometricsPosted on 08/12/2016

You probably know that even a trading strategy which is actually no different from a random walk (RW henceforth) can perform very well. Perhaps you chalk it up to short-run volatility. But in fact there is a deeper reason for this to happen, in force. If you insist on using and continuously testing a RW strategy, you will find, at some point with certainty, that it has significant outperformance.

This post explains why is that.