Eran Raviv - Page 6 of 16 - Modern statistics and econometrics with applications to financial markets

R tips and tricks – the assign() function

Blog, MiscTips, RPosted on 07/18/2018

The R language has some quirks compared to other languages. One thing which you need to constantly watch for when moving to- or from R, is that R starts its indexing at one, while almost all other languages start indexing at zero, which takes some getting used to. Another quirk is the explicit need for clarity when modifying a variable, compared with other languages.

Take python for example, but I think it looks the same in most common languages:

R in Finance highlights

Blog, Finance and Trading, R, Statistics and EconometricsPosted on 06/26/2018

The yearly R in Finance conference is one of my favorites:

Curse of dimensionality part 3: Higher-Order Comoments

Blog, Finance and Trading, Statistics and EconometricsPosted on 06/20/2018

Higher moments such as Skewness and Kurtosis are not as explored as they should be.

These moments are crucial for managing portfolio risk. At least as important as volatility, if not more. Skewness relates to asymmetry risk and Kurtosis relates to tail risk.

Despite their great importance, those higher moments enjoy only a small portion of attention compared with their lower more friendly moments: the mean and the variance. In my opinion, one reason for this may be the impossibility of estimating those moments, estimating them accurately that is.

It is yet another situation where Curse of Dimensonality rears its enchanting head (and an idea for a post is born..).

Portfolio Construction with R

Blog, Finance and TradingPosted on 04/10/2018

Preview

Constructing a portfolio means allocating your money between few chosen assets. The simplest thing you can do is evenly split your money between few chosen assets. Simple as it is, good research shows it is just fine, and even better than other more sophisticated methods (for example Optimal Versus Naive Diversification: How Inefficient is the 1/N). However, there is also good research that declares the opposite (for example Large Dynamic Covariance Matrices) so go figure.

Anyway, this post shows a few of the most common to build a portfolio. We will discuss portfolios which are optimized for:

Equal Risk Contribution
Global Minimum Variance
Minimum Tail-Dependence
Most Diversified
Equal weights

We will optimize based on half the sample and see out-of-sample results in the second half. Simply speaking, how those portfolios have performed.

Machine learning is simply statistics

BlogPosted on 03/07/2018

Note: I usually write more technical posts, this is an opinion piece. And you know what they say: opinions are like feet, everybody’s got a couple.

Machine learning is simply statistics

A lot of buzz words nowadays. Data Science, business intelligence, machine learning, deep learning, statistical learning, predictive analytics, knowledge discovery, data mining, pattern recognition. Surely you can think of a few more. So many you can fill a chapter explaining and discussing the difference (e.g. Data science for dummies).

But really, we are all after the same thing. The thing being: extracting knowledge from data. Perhaps it is because we want to explain something, perhaps it is because we want to predict something; the reason is secondary. All those terms fall under one umbrella which is modern statistics, period. I freely admit that the jargon is different. What is now dubbed feature space in machine learning literature is simply independent, or explanatory variables in the statistical literature. What one calls softmax classifier in the deep learning context, another calls multinomial regression in a basic 1-0-1 statistics course. Feature engineering? Call it variable transformation rather.

Bitcoin exponential growth

Blog, Finance and TradingPosted on 01/28/2018

Is bitcoin a bubble? I don’t know. What defines a bubble? The price should drastically overestimate the underlying fundamentals. I simply don’t know much about blockchain to have an opinion there. A related characteristic is a run-away price. Going up fast just because it is going up fast.

Most popular posts – 2017

BlogPosted on 01/18/2018

Writing this, I can’t believe how quickly the year 2017 has gone by. Also weird, we are already three weeks into 2018, unreal. Time flies when you’re having fun I guess.

The analytics report shows that the three most popular posts for 2017 are:

Understanding Kullback – Leibler Divergence

Blog, Statistics and EconometricsPosted on 12/09/2017

It is easy to measure distance between two points. But what about measuring distance between two distributions? Good question. Long answer. Welcome the Kullback – Leibler Divergence measure.

The motivation for thinking about the Kullback – Leibler Divergence measure is that you can pick up questions such as: “how different was the behavior of the stock market this year compared with the average behavior?”. This is a rather different question than the trivial “how was the return this year compared to the average return?”.

R tips and tricks – the pipe operator

Blog, Code, RPosted on 11/13/2017

The R language has improved over the years. Amidst numerous splendid augmentations, the magrittr package by Stefan Milton Bache allows us to write more readable code. It uses an ingenious piping convention which will be explained shortly. This post talks about when to use those pipes, and when to avoid using pipes in your code. I am all about ~~that bass~~ readability, but I am also about speed. Use the pipe operator, but watch the tradeoff.

Bitcoin investing

Blog, Finance and TradingPosted on 10/29/2017

Bitcoin is a cryptocurrency created in 2008. I have never belonged with team “gets it” when it comes to Bitcoin investing, but perhaps time has come to reconsider.

R tips and tricks – boxplots for large data

Blog, RPosted on 10/03/2017

Admit it, you always thought there is something off with how boxplot look like. You can tell there should be some way in which more information can be depicted, they simply look much too spacious. Evidently you are not the only one. Many have tried to suggest better ways to plot the same information. Here on 40 years of boxplots.

R vs MATLAB – round 4

Blog, CodePosted on 09/06/2017

Machine estimated reading time: [est_time] 

This is another comparison between R and MATLAB (Python also in the mix this time). In previous rounds we discussed the differences in 3d visualization, differences in syntax and input-output differences. Today is about computational speed.

Visualizing Tail Risk

Blog, Finance and Trading, RiskPosted on 08/07/2017

Tail risk conventionally refers to the risk of a large and sharp draw down of the portfolio. How large is subjective and depends on how you define what is a tail.

A lot of research is directed towards having a good estimate of the tail risk. Some fairly new research also now indicates that investors perceive tail risk to be a stand-alone risk to be compensated for, rather than bundled together with the usual variability of the portfolio. So this risk now gets even more attention.

R tips and tricks – Package Dependencies

Blog, CodePosted on 07/27/2017

In this post about the most popular machine learning R packages I showed the incredible- exponential growth displayed by R software, measured by the number of package downloads. Here is another graph which shows a more linear growth in R (and an impressive growth in python) as measured by % of question posted in stack overflow

LASSO, LASSO, LASSO

Blog, Statistics and EconometricsPosted on 07/05/2017

LASSO stands for Least Absolute Shrinkage and Selection Operator. It was first introduced 21 years ago by Robert Tibshirani (Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society. Series B). In 2004 the four statistical masters: Efron, Hastie, Johnstone and Tibshirani joined together to write the paper Least angle regression published in the Annals of statistics. It is that paper that sent the LASSO to the podium. The reason? they removed a computational barrier. Armed with a new ingenious geometric interpretation, they presented an algorithm for solving the LASSO problem. The algorithm is as simple as solving an OLS problem, and with computer code to accompany their paper, the LASSO was set for its liftoff*.

The LASSO overall reduces model complexity. It does this by completely excluding some variables, using only a subset of the original potential explanatory variables. Since this can add to the story of the model, the reduction in complexity is a desired property. Clarity of authors’ exposition and well rehashed computer code are further reasons for the fully justified, full fledged LASSO flareup.

This is not a LASSO tutorial. Google-search results, undoubtedly refined over years of increased popularity, are clear enough by now. Also, if you are still reading this I imagine you already know what is the LASSO and how it works. To continue from this point, what follows is a selective list of milestones from the academic literature- some theoretical and practical extensions.