Blog - 13/16 - Eran Raviv

My favourite statistician

Blog, Miscellaneous, Statistics and EconometricsPosted on 02/17/2014

We are all standing on the shoulders of giants. Bradley Efron is one such giant. With the invention of the bootstrap in 1979 and later with his very influential 2004 paper about the Least Angle Regression (and the accompanied software written in R).

R vs MATLAB – round 2

Blog, CodePosted on 02/03/2014

R takes it. I prefer coding in R over MATLAB. I feel R understands that I do not like to type too much. A few examples:

R vs Matlab (round 1)

Blog, RPosted on 01/27/2014

Matlab has it this time, with solid 3D plotting capabilities.

Don’t believe anything you read

Blog, Miscellaneous, Statistics and EconometricsPosted on 01/21/2014

I just finished reading An estimate of the science-wise false discovery rate and application to the top medical literature. The authors ask how many of what we read is scientific journals is actually incorrect, or false.

Most popular posts – 2013

Blog, MiscellaneousPosted on 01/07/2014

Here (what people think) are the most interesting posts in 2013:
Understanding Multicollinearity
On p-value
Bootstrapping time series
Quantile Autoregression in R

My Own favourite:
How Important is Variable Selection?

What is overfitting?

Blog, Statistics and EconometricsPosted on 12/23/2013

Overfitting is strongly related to variable selection. It is a common problem and a tough one, best explained by way of example.

Comments on Comments in R

Blog, RPosted on 12/06/2013

When you are busy with a lengthy project, like writing a paper, you create many objects along the way. Every time you log into the project, you need to remember what is what. In the past, each new working session I used to rerun the script anew and follow what each line is doing until I get back the objects I need and continue working. Apart from helping you remember what you are doing, it is very useful for reproducibility, at least given your data, in the sense that you are sure nothing is overrun using the console and it is all there. Those days are over.

The Importance of being inauthentic

Blog, Miscellaneous, MiscTipsPosted on 11/09/2013

Presenting properly is important. Here is how I think it should look like,

Omitted Variable Bias

Blog, Statistics and EconometricsPosted on 09/30/2013

Frequently, we see the term ‘control variables’. The researcher introduces dozens of explanatory variables she has no interest in. This is done in order to avoid the so-called ‘Omitted Variable Bias’.

What is Omitted Variable Bias?

In general, OLS estimator has great properties, not the least important is the fact that for a finite number of observations you can faithfully retrieve the marginal effect of X on Y, that is $E(\widehat{\beta}) = \beta$ . This is very much not the case when you have a variable that should be included in the model but is left out. As in my previous posts about Multicollinearity and heteroskedasticity, I only try to provide the intuition since you are probably familiar with the result itself.

Bayesian vs. Frequentist in Practice (cont’d)

Blog, Statistics and EconometricsPosted on 09/23/2013

Few weeks back I simulated a model and made the point that in practice, the difference between Bayesian and Frequentist is not large. Here I apply the code to some real data; a model for Industrial Production (IP).

Stocks with upside potential

Blog, Finance and TradingPosted on 09/15/2013

THIS IS NOT INVESTMENT ADVICE. ACTING BASED ON THIS POST MAY, AND IN ALL PROBABILITY WILL, CAUSE MONETARY LOSS.

Quantile regression is now established as an important econometric tool. Unlike mean regression (OLS), the target is not the mean given x but some quantile given x. You can use it to find stocks that present good upside potential. You may think it has to do with the beta of a stock, but the beta is OLS-related, and is symmetric. High-beta stock rewards with an upside swing if the market spikes but symmetrically, you can suffer a large draw-down when the market drops. This is not an upside potential.

Bayesian vs. Frequentist in Practice

Blog, Statistics and EconometricsPosted on 08/28/2013

Rivers of ink have been spilled over the ‘Bayesian vs. Frequentist’ dispute. Most of us were trained as Frequentists. Probably because the computational power needed for Bayesian analysis was not around when the syllabus of your statistical/econometric courses was formed. In this age of tablets and fast internet connection, your training does not matter much, you can easily transform between the two approaches, engaging the right webpages/communities. I will not talk about the ideological differences between the two, or which approach is more appealing and why. Larry Wasserman already gave an excellent review.

Understanding Multicollinearity

Blog, R, Statistics and EconometricsPosted on 06/12/2013

Roughly speaking, Multicollinearity occurs when two or more regressors are highly correlated. As with heteroskedasticity, students often know what does it mean, how to detect it and are taught how to cope with it, but not why is it so. From Wikipedia: “In this situation (Multicollinearity) the coefficient estimates may change erratically in response to small changes in the model or the data.” The Wikipedia entry continues to discuss detection, implications and remedies. Here I try to provide the intuition.

How Important is Variable Selection?

Blog, R, Statistics and EconometricsPosted on 05/22/2013

Very.

If you have 10 possible independent regressors, and none of which matter, you have a good chance to find at least one is important.

Design quotes

Blog, MiscellaneousPosted on 05/16/2013

Slides 18 and 30 are especially nice: