Blog - 11/16 - Eran Raviv

How regression statistics mislead experts

Blog, Miscellaneous, Statistics and EconometricsPosted on 06/29/2015

This post concerns a paper I came across checking the nominations for best paper published in International Journal of Forecasting (IJF) for 2012-2013. The paper bears the annoyingly irresistible title: “The illusion of predictability: How regression statistics mislead experts”, and was written by Soyer Emre and Robin Hogarth (henceforth S&H). The paper resonates another paper published in “Psychological review” (1973), by Daniel Kahneman and Amos Tversky: “On the psychology of prediction”. Despite the fact that S&H do not cite the 1973 paper, I find it highly related.

PCA as regression (2)

Blog, Statistics and EconometricsPosted on 06/17/2015

In a previous post on this subject, we related the loadings of the principal components (PC’s) from the singular value decomposition (SVD) to regression coefficients of the PC’s onto the X matrix. This is normal given the fact that the factors are supposed to condense the information in X, and what better way to do that than to minimize the sum of squares between a linear combination of X (the factors) to the X matrix itself. A reader was asking where does principal component regression (PCR) enter. Here we relate the PCR to the usual OLS.

Adding text to R plot

Blog, CodePosted on 06/10/2015

Diversity is a real strength. By now it is common knowledge. I often see institutions openly encourage multinational environment and multidisciplinary professionals, with specific “on-the-job” training to tailor for own needs. No one knows a lot about a lot, so bringing different together enhance independent thinking and knowledge available to the organization. Clarity of communication then becomes even more important, and making sure your figures are quickly understandable goes a long way.

Quasi-Maximum Likelihood (QML) beauty

Blog, Statistics and EconometricsPosted on 05/16/2015

Beauty.. really? well, beauty is in the eye of the beholder.

Dark background theme for reading

Blog, Miscellaneous, MiscTipsPosted on 03/30/2015

I am picking up on Rob Hyndman’s suggestion on dark themes for writing. I carried on with a bit of “internet-scientific” reading. Opinions on ‘dark vs white’ background themes, which is better for your eyes, are mixed. You are busy, so just the bottom line: do what works for you.

Yield curve forecasting

Code, Statistics and EconometricsPosted on 03/21/2015

One of my Ph.D papers was published recently. It deals with yield curve forecasting.
Here is the code for applying the Nelson-Siegel model to any yield curve.

Out-of-sample data snooping

Blog, Risk, Statistics and EconometricsPosted on 03/20/2015

In this day and age, paralleling and mining big data, I like to think about the new complications that follow this abundance. By way of analogy, Alzheimer’s dementia is an awful condition, but we are only familiar with it since medical advances allow for higher life expectancy. Better abilities allow for new predicaments. One of those new predicament is what I call out-of-sample data snooping.

Syntax highlighting style in Rmarkdown

Blog, MiscTips, RPosted on 03/14/2015

Energy idiosyncratic volatility

Blog, Finance and Trading, RiskPosted on 01/29/2015

Recently, volatility has been on the up. Generally, we associate rising volatility with a bear regime, but we also know there is a percolating oil shock. Is the volatility we see in the stock market broad-based, or is it the effect brought about by sharp the drop in oil prices (so related to the energy sector)? I propose here a practical way to take a closer look at it.

Fed Fund Rate futures curve and what they tell us

Blog, Finance and TradingPosted on 01/25/2015

“The Fed is certainly moving forward with plans to normalize interest rates.” We keep on hearing that, we believed it in the past and we believe it now. We believe that the Fed believes and that, in fact, this means something.

Should we become more suspicious and less trusting given history? Let’s take a look.

Most popular posts – 2014

BlogPosted on 01/13/2015

Well.. better late than never:

The solid winner this year is:
R vs MATLAB (Round 3)
Followed by a far second
Mom, are we bear yet? (2)
and third:
Detecting bubbles in real time

And my own personal favorite for the year:
Advances in post-model-selection inference (2)

Linking backtesting with multiple testing

Blog, Finance and Trading, Statistics and EconometricsPosted on 11/23/2014

The other day, Harvey Campbell from Duke University gave a talk where I work. The talk- bearing the exciting name “Backtesting” was based on a paper by the same name.

The authors tackle the important problem of data-snooping; we need to account for the fact that we conducted many trials until we found a strategy (or a variable) that ‘works’. Accessible explanations can be found here and here. In this day and age, the ‘story’ behind what you are doing is more important than ever, given the things you can do using your desktop/laptop.

Mom, are we bear yet? (2)

Blog, Finance and Trading, Risk, Statistics and EconometricsPosted on 10/20/2014

5 weeks ago we took a look at the rising volatility in the (US) equity markets via a time-series threshold model for the VIX. The estimate suggested we are crossing (or crossed) to the more volatile regime. Here, taking somewhat different Hidden Markov Model (HMM) approach we gather more corroboration (few online references at the bottom if you are not familiar with HMM models. The word hidden since the state is ‘invisible’).

Advances in post-model-selection inference (2)

Blog, Statistics and EconometricsPosted on 10/15/2014

In the previous post we reviewed a way to handle the problem of inference after model selection. I recently read another related paper which goes about this complicated issues from a different angle. The paper titled ‘A significance test for the lasso’ is a real step forward in this area. The authors develop the asymptotic distribution for the coefficients, accounting for the selection step. A description of the tough problem they successfully tackle can be found here.

The usual way to test if variable (say variable j) adds value to your regression is using the F-test. We once compute the regression excluding variable j, and once including variable j. Then we compare the sum of squared errors and we know what is the distribution of the statistic, it is F, or $\chi^2$ , depends on your initial assumptions, so F-test or $\chi^2$ -test. These are by far the most common tests to check if a variable should or should not be included. Problem arises if you search for variable j beforehand.

Advances in post-model-selection inference

Blog, Statistics and EconometricsPosted on 09/23/2014

Along with improvements in computational power, variable selection has become one of the problems attracting the most effort. We (well.. experts) have made huge leaps in the realm of variable selection. Prediction being probably the most common objective. LASSO (Least Absolute Sum of Squares Operator) leading the way from the west (Stanford) with its many variations (Adaptive, Random, Relaxed, Fused, Grouped, Bayesian.. you name it), SCAD (Smoothly Clipped Absolute Deviation) catching up from the east (Princeton). With the good progress in that area, not secondary but has been given less attention -> Inference is now being worked out.