Most popular posts – 2015

The top three for the year are:
Out-of-sample data snooping
Code for my yield curve forecasting paper
Review of a couple of books
I personally enjoyed the most writing a few words on ML estimation, and about those great statistical discoveries. Since the last post did not involve any code or images I initially thought it would be a breeze. I in fact spent twice the time I usually do, and it was all good fun.

In 2015 I wrote quite a bit about volatility and correlation. In 2016 I plan to learn more (so to write more) about portfolio construction.

Present-day great statistical discoveries

Some time during the 18th century the biologist and geologist Louis Agassiz said: “Every great scientific truth goes through three stages. First, people say it conflicts with the Bible. Next they say it has been discovered before. Lastly they say they always believed it”. Nowadays I am not sure about the Bible but yeah, it happens.

I express here my long-standing and long-lasting admiration for the following triplet of present-day great discoveries. The authors of all three papers had initially struggled to advance their ideas, which echos the quote above. Here they are, in no particular order.

More

Multivariate volatility forecasting (4), factor models

To be instructive, I always use very few tickers to describe how a method works (and this tutorial is no different). Most of the time is spent on methods that we can easily scale up. Even if exemplified using only say 3 tickers, a more realistic 100 or 500 is not an obstacle. But, is it really necessary to model the volatility of each ticker individually? No.

If we want to forecast the covariance matrix of all components in the Russell 2000 index we don’t leave much on the table if we model only a few underlying factors, much less than 2000.

Volatility factor models are one of those rare cases where the appeal is both theoretical and empirical. The idea is to create a few principal components and, under the reasonable assumption that they drive the bulk of comovement in the data, model those few components only.

More

Multivariate volatility forecasting (3), Exponentially weighted model

Broadly speaking, complex models can achieve great predictive accuracy. Nonetheless, a winner in a kaggle competition is required only to attach a code for the replication of the winning result. She is not required to teach anyone the built-in elements of his model which gives the specific edge over other competitors. In a corporation settings your manager and his manager and so forth MUST feel comfortable with the underlying model. Mumbling something like “This artificial-neural-network is obtained by using a grid search over a range of parameters and connection weights where the architecture itself is fixed beforehand…”, forget it!

More

Correlation and correlation structure (2), copulas

This post is about copulas and heavy tails. In a previous post we discussed the concept of correlation structure. The aim is to characterize the correlation across the distribution. Prior to the global financial crisis many investors were under the impression that they were diversified, and they were, for how things looked there and then. Alas, when things went south, correlation in the new southern regions turned out to be different\stronger than that in normal times. The hard-won diversification benefits evaporated exactly when you needed them the most. This adversity has to do with fat-tail in the joint distribution, leading to great conceptual and practical difficulties. Investors and bankers chose to swallow the blue pill, and believe they are in the nice Gaussian world, where the math is magical and elegant. Investors now take the red pill, where the math is ugly and problems abound.

More

Multivariate volatility forecasting, part 2 – equicorrelation

Last time we showed how to estimate a CCC and DCC volatility model. Here I describe an advancement labored by Engle and Kelly (2012) bearing the name: Dynamic equicorrelation. The idea is nice and the paper is well written.

Departing where the previous post ended, once we have (say) the DCC estimates, instead of letting the variance-covariance matrix be, we force some structure by way of averaging correlation across assets. Generally speaking, correlation estimates are greasy even without any breaks in dynamics, so I think forcing some structure is for the better.

More

Correlation and correlation structure (1); quantile regression

Given a constant speed, time and distance are fully correlated. Provide me with the one, and I’ll give you the other. When two variables have nothing to do with each other, we say that they are not correlated.

You wish that would be the end of it. But it is not so. As it is, things are perilously more complicated. By far the most familiar correlation concept is the Pearson’s correlation. Pearson’s correlation coefficient checks for linear dependence. Because of it, we say it is a parametric measure. It can return an actual zero even when the two variables are fully dependent on each other (link to cool chart).

More

Show yourself (look “under the hood” of a function in R)

Open source software has many virtues. Being free is not the least of which. However, open source comes with “ABSOLUTELY NO WARRANTY” and with no power comes no responsibility (I wonder..). Since no one is paying, by definition it is your sole responsibility to make sure the code does what it is supposed to be doing. Thus, looking “under the hood” of a function written by someone else is can be of service. There are more reasons to examine the actual underlying code.

More

Live volatility monitor

In April this year, Rstudio notified early users of shiny that Glimmer and Spark servers which host interactive-applications would be decommissioned. Basically, the company is moving forward to generate revenues from this great interactive application service. For us aspirants who use the service strictly as a hobby, that means, in a word: pay.

Basic subscription now costs around 40$ per month. Keeping your applications free of charge is possible BUT, as long as it is not used for more than 25 hours per month. So if your site generate some traffic, most users would simply not be able to access the app. Apart from that, you are subject to some built-in Rstudio’s logo which can’t be removed without having a paid subscription. That is a shame, but a company’s gotta eat right? I am using Rstudio’s services from their very beginning, and the company definitely deserve to eat! only I wish there would be another step between the monthly 0$ option which provides too slim capabilities, and the monthly 40$ option which is, in my admittedly biased opinion, too pricey for a ‘sometimes’ hobby.

More

Multivariate volatility forecasting (1)

Introduction

When hopping from univariate volatility forecasts to multivariate volatility forecast, we need to understand that now we have to forecast not only the univariate volatility element, which we already know how to do, but also the covariance elements, which we do not know how to do, yet. Say you have two series, then this covariance element is the off-diagonal of the 2 by 2 variance-covariance matrix. The precise term we should use is “variance-covariance matrix”, since the matrix consists of the variance elements on the diagonal and the covariance elements on the off-diagonal. But since it is very tiring to read\write “variance-covariance matrix”, it is commonly referred to as the covariance matrix, or sometimes less formally as var-covar matrix.

More

How regression statistics mislead experts

This post concerns a paper I came across checking the nominations for best paper published in International Journal of Forecasting (IJF) for 2012-2013. The paper bears the annoyingly irresistible title: “The illusion of predictability: How regression statistics mislead experts”, and was written by Soyer Emre and Robin Hogarth (henceforth S&H). The paper resonates another paper published in “Psychological review” (1973), by Daniel Kahneman and Amos Tversky: “On the psychology of prediction”. Despite the fact that S&H do not cite the 1973 paper, I find it highly related.

More