Blog - 7/16 - Eran Raviv

Understanding Kullback – Leibler Divergence

Blog, Statistics and EconometricsPosted on 12/09/2017

It is easy to measure distance between two points. But what about measuring distance between two distributions? Good question. Long answer. Welcome the Kullback – Leibler Divergence measure.

The motivation for thinking about the Kullback – Leibler Divergence measure is that you can pick up questions such as: “how different was the behavior of the stock market this year compared with the average behavior?”. This is a rather different question than the trivial “how was the return this year compared to the average return?”.

R tips and tricks – the pipe operator

Blog, Code, RPosted on 11/13/2017

The R language has improved over the years. Amidst numerous splendid augmentations, the magrittr package by Stefan Milton Bache allows us to write more readable code. It uses an ingenious piping convention which will be explained shortly. This post talks about when to use those pipes, and when to avoid using pipes in your code. I am all about ~~that bass~~ readability, but I am also about speed. Use the pipe operator, but watch the tradeoff.

Bitcoin investing

Blog, Finance and TradingPosted on 10/29/2017

Bitcoin is a cryptocurrency created in 2008. I have never belonged with team “gets it” when it comes to Bitcoin investing, but perhaps time has come to reconsider.

R tips and tricks – boxplots for large data

Blog, RPosted on 10/03/2017

Admit it, you always thought there is something off with how boxplot look like. You can tell there should be some way in which more information can be depicted, they simply look much too spacious. Evidently you are not the only one. Many have tried to suggest better ways to plot the same information. Here on 40 years of boxplots.

R vs MATLAB – round 4

Blog, CodePosted on 09/06/2017

Machine estimated reading time:  [est_time]  

This is another comparison between R and MATLAB (Python also in the mix this time). In previous rounds we discussed the differences in 3d visualization, differences in syntax and input-output differences. Today is about computational speed.

Visualizing Tail Risk

Blog, Finance and Trading, RiskPosted on 08/07/2017

Tail risk conventionally refers to the risk of a large and sharp draw down of the portfolio. How large is subjective and depends on how you define what is a tail.

A lot of research is directed towards having a good estimate of the tail risk. Some fairly new research also now indicates that investors perceive tail risk to be a stand-alone risk to be compensated for, rather than bundled together with the usual variability of the portfolio. So this risk now gets even more attention.

R tips and tricks – Package Dependencies

Blog, CodePosted on 07/27/2017

In this post about the most popular machine learning R packages I showed the incredible- exponential growth displayed by R software, measured by the number of package downloads. Here is another graph which shows a more linear growth in R (and an impressive growth in python) as measured by % of question posted in stack overflow

LASSO, LASSO, LASSO

Blog, Statistics and EconometricsPosted on 07/05/2017

LASSO stands for Least Absolute Shrinkage and Selection Operator. It was first introduced 21 years ago by Robert Tibshirani (Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society. Series B). In 2004 the four statistical masters: Efron, Hastie, Johnstone and Tibshirani joined together to write the paper Least angle regression published in the Annals of statistics. It is that paper that sent the LASSO to the podium. The reason? they removed a computational barrier. Armed with a new ingenious geometric interpretation, they presented an algorithm for solving the LASSO problem. The algorithm is as simple as solving an OLS problem, and with computer code to accompany their paper, the LASSO was set for its liftoff*.

The LASSO overall reduces model complexity. It does this by completely excluding some variables, using only a subset of the original potential explanatory variables. Since this can add to the story of the model, the reduction in complexity is a desired property. Clarity of authors’ exposition and well rehashed computer code are further reasons for the fully justified, full fledged LASSO flareup.

This is not a LASSO tutorial. Google-search results, undoubtedly refined over years of increased popularity, are clear enough by now. Also, if you are still reading this I imagine you already know what is the LASSO and how it works. To continue from this point, what follows is a selective list of milestones from the academic literature- some theoretical and practical extensions.

The annual useR! conference

Blog, MiscellaneousPosted on 06/27/2017

This year on 4th of July I will be attending the annual usrR! conference. While it is often in the US, this year the UseR! conference takes place in the nearby Brussels. Sweet.

The website is state-of-the-art “don’t make me think” style. The program looks amazing. Belgian beers with the R community, exciting. Registration still open.

Watch this space for highlights and afterthoughts.

Density Estimation Using Regression

Blog, Statistics and EconometricsPosted on 06/26/2017

Density estimation using regression? Yes we can!

I like regression. It is one of those simple yet powerful statistical methods. You always know exactly what you are doing. This post is about density estimation, and how to get an estimate of the density using (Poisson) regression.

Computer Age Statistical Inference – now free

Blog, Statistics and EconometricsPosted on 06/05/2017

If you consider yourself Econometrician\Statistician or one of those numerous buzz word synonyms that are floating around these days, Computer Age Statistical Inference: Algorithms, Evidence and Data Science by Bradley Efron and Trevor Hastie is a book you can’t miss, and now nor should you. You can download the book for free.

My first inclination is to deliver an unequivocal recommendation. But in truth, my praises would probably fall short of what was already written.

So what can I give you? I can say that there are currently 6 amazon reviews, with a 4.5 average. One of the reviewers writes that there is some overlap with previous work. I agree. But it doesn’t matter. It reads so well, call it a refresher. Let’s face it, it is not as if you always have it so clear in your head such that you can afford to skip sections because you read something similar before.

I can also tell you why I think it is a special book.

Random Books

Blog, MiscellaneousPosted on 06/03/2017

It seems like a very long while since my bachelor. Checking my bookshelf the other day I was thinking to flag some of those books which helped or inspired me along the way. Here they are in no particular order.

Statistical Shrinkage

Blog, Statistics and EconometricsPosted on 05/10/2017

Shrinkage in statistics has increased in popularity over the decades. Now statistical shrinkage is commonplace, explicitly or implicitly.

But when is it that we need to make use of shrinkage? At least partly it depends on signal-to-noise ratio.

Top countries in poker (Test equality of proportions using bootstrap)

Blog, Miscellaneous, Statistics and EconometricsPosted on 05/04/2017

 Machine-estimated reading time:  [est_time]  

Every once in a while I play poker online. The poker site allows you to ask for tournament history. You get an email which contains hundreds summaries (I open several tables at once so have quite some history), a typical summary looks as follows:

Machine Trading – book review

Blog, Finance and Trading, MiscellaneousPosted on 05/02/2017

 Estimated reading time:  [est_time]  

In trading and in trading-related research one could be quickly overwhelmed with the sea of ink devoted to trading strategies and the like. It is essential that you “pick your battles” so to speak. I recently finished reading Machine Trading, by Ernest Chan. Here is what I think about the book.