Econometric - 4/5 - Eran Raviv

Multivariate volatility forecasting (1)

Blog, Finance and Trading, Risk, Statistics and EconometricsPosted on 07/13/2015

Introduction

When hopping from univariate volatility forecasts to multivariate volatility forecast, we need to understand that now we have to forecast not only the univariate volatility element, which we already know how to do, but also the covariance elements, which we do not know how to do, yet. Say you have two series, then this covariance element is the off-diagonal of the 2 by 2 variance-covariance matrix. The precise term we should use is “variance-covariance matrix”, since the matrix consists of the variance elements on the diagonal and the covariance elements on the off-diagonal. But since it is very tiring to read\write “variance-covariance matrix”, it is commonly referred to as the covariance matrix, or sometimes less formally as var-covar matrix.

How regression statistics mislead experts

Blog, Miscellaneous, Statistics and EconometricsPosted on 06/29/2015

This post concerns a paper I came across checking the nominations for best paper published in International Journal of Forecasting (IJF) for 2012-2013. The paper bears the annoyingly irresistible title: “The illusion of predictability: How regression statistics mislead experts”, and was written by Soyer Emre and Robin Hogarth (henceforth S&H). The paper resonates another paper published in “Psychological review” (1973), by Daniel Kahneman and Amos Tversky: “On the psychology of prediction”. Despite the fact that S&H do not cite the 1973 paper, I find it highly related.

PCA as regression (2)

Blog, Statistics and EconometricsPosted on 06/17/2015

In a previous post on this subject, we related the loadings of the principal components (PC’s) from the singular value decomposition (SVD) to regression coefficients of the PC’s onto the X matrix. This is normal given the fact that the factors are supposed to condense the information in X, and what better way to do that than to minimize the sum of squares between a linear combination of X (the factors) to the X matrix itself. A reader was asking where does principal component regression (PCR) enter. Here we relate the PCR to the usual OLS.

Quasi-Maximum Likelihood (QML) beauty

Blog, Statistics and EconometricsPosted on 05/16/2015

Beauty.. really? well, beauty is in the eye of the beholder.

Yield curve forecasting

Code, Statistics and EconometricsPosted on 03/21/2015

One of my Ph.D papers was published recently. It deals with yield curve forecasting.
Here is the code for applying the Nelson-Siegel model to any yield curve.

Mom, are we bear yet? (2)

Blog, Finance and Trading, Risk, Statistics and EconometricsPosted on 10/20/2014

5 weeks ago we took a look at the rising volatility in the (US) equity markets via a time-series threshold model for the VIX. The estimate suggested we are crossing (or crossed) to the more volatile regime. Here, taking somewhat different Hidden Markov Model (HMM) approach we gather more corroboration (few online references at the bottom if you are not familiar with HMM models. The word hidden since the state is ‘invisible’).

Advances in post-model-selection inference (2)

Blog, Statistics and EconometricsPosted on 10/15/2014

In the previous post we reviewed a way to handle the problem of inference after model selection. I recently read another related paper which goes about this complicated issues from a different angle. The paper titled ‘A significance test for the lasso’ is a real step forward in this area. The authors develop the asymptotic distribution for the coefficients, accounting for the selection step. A description of the tough problem they successfully tackle can be found here.

The usual way to test if variable (say variable j) adds value to your regression is using the F-test. We once compute the regression excluding variable j, and once including variable j. Then we compare the sum of squared errors and we know what is the distribution of the statistic, it is F, or $\chi^2$ , depends on your initial assumptions, so F-test or $\chi^2$ -test. These are by far the most common tests to check if a variable should or should not be included. Problem arises if you search for variable j beforehand.

Advances in post-model-selection inference

Blog, Statistics and EconometricsPosted on 09/23/2014

Along with improvements in computational power, variable selection has become one of the problems attracting the most effort. We (well.. experts) have made huge leaps in the realm of variable selection. Prediction being probably the most common objective. LASSO (Least Absolute Sum of Squares Operator) leading the way from the west (Stanford) with its many variations (Adaptive, Random, Relaxed, Fused, Grouped, Bayesian.. you name it), SCAD (Smoothly Clipped Absolute Deviation) catching up from the east (Princeton). With the good progress in that area, not secondary but has been given less attention -> Inference is now being worked out.

PCA as regression

Blog, Statistics and EconometricsPosted on 09/17/2014

A way to think about principal component analysis is as a matrix approximation. We have a matrix $X_{T \times P}$ and we want to get a ‘smaller’ matrix $Z_{T \times K}$ with $K<P$ . We want the new ‘smaller’ matrix to be close to the original despite its reduced dimension. Sometimes we say ‘such that Z capture the bulk of comovement in X. Big data technology is such that nowadays the number of cross sectional units (number of columns in X) P has grown to be very large compared to the sixties say. Now, with ‘google maps would like to use your current location’ and future ‘google fridge would like to access your amazon shopping list’, you can count on P growing exponentially, we are just getting started. A lot of effort goes into this line of research, and with great leaps.

Bias vs. Consistency

Blog, Statistics and EconometricsPosted on 06/02/2014

Especially for undergraduate students but not just, the concepts of unbiasedness and consistency as well as the relation between these two are tough to get one’s head around. My aim here is to help with this. We start with a short explanation of the two concepts and follow with an illustration.

Bootstrap Critisim (example)

Blog, Statistics and EconometricsPosted on 05/14/2014

In a previous post I underlined an inherent feature of the non-parametric Bootstrap, it’s heavy reliance on the (single) realization of the data. This feature is not a bad one per se, we just need to be aware of the limitations. From comments made on the other post regarding this, I gathered that a more concrete example can help push this point across.

Bootstrap criticism

Blog, Finance and Trading, Statistics and EconometricsPosted on 03/12/2014

The title reads Bootstrap criticism, but in fact it should be Non-parametric bootstrap criticism. I am all in favour of Bootstrapping, but I point here to a major drawback.

What is overfitting?

Blog, Statistics and EconometricsPosted on 12/23/2013

Overfitting is strongly related to variable selection. It is a common problem and a tough one, best explained by way of example.

Omitted Variable Bias

Blog, Statistics and EconometricsPosted on 09/30/2013

Frequently, we see the term ‘control variables’. The researcher introduces dozens of explanatory variables she has no interest in. This is done in order to avoid the so-called ‘Omitted Variable Bias’.

What is Omitted Variable Bias?

In general, OLS estimator has great properties, not the least important is the fact that for a finite number of observations you can faithfully retrieve the marginal effect of X on Y, that is $E(\widehat{\beta}) = \beta$ . This is very much not the case when you have a variable that should be included in the model but is left out. As in my previous posts about Multicollinearity and heteroskedasticity, I only try to provide the intuition since you are probably familiar with the result itself.

Bayesian vs. Frequentist in Practice (cont’d)

Blog, Statistics and EconometricsPosted on 09/23/2013

Few weeks back I simulated a model and made the point that in practice, the difference between Bayesian and Frequentist is not large. Here I apply the code to some real data; a model for Industrial Production (IP).