This post concerns a paper I came across checking the nominations for best paper published in International Journal of Forecasting (IJF) for 2012-2013. The paper bears the annoyingly irresistible title: “The illusion of predictability: How regression statistics mislead experts”, and was written by Soyer Emre and Robin Hogarth (henceforth S&H). The paper resonates another paper published in “Psychological review” (1973), by Daniel Kahneman and Amos Tversky: “On the psychology of prediction”. Despite the fact that S&H do not cite the 1973 paper, I find it highly related.
Category: Miscellaneous
Dark background theme for reading
I am picking up on Rob Hyndman’s suggestion on dark themes for writing. I carried on with a bit of “internet-scientific” reading. Opinions on ‘dark vs white’ background themes, which is better for your eyes, are mixed. You are busy, so just the bottom line: do what works for you.
R vs MATLAB (Round 3)
At least for me, R by faR. MATLAB has its own way of doing things, which to be honest can probably be defended from many angles. Here are few examples for not so subtle differences between R and MATLAB:
My favourite statistician
We are all standing on the shoulders of giants. Bradley Efron is one such giant. With the invention of the bootstrap in 1979 and later with his very influential 2004 paper about the Least Angle Regression (and the accompanied software written in R).
Don’t believe anything you read
I just finished reading An estimate of the science-wise false discovery rate and application to the top medical literature. The authors ask how many of what we read is scientific journals is actually incorrect, or false.
Most popular posts – 2013
Here (what people think) are the most interesting posts in 2013:
Understanding Multicollinearity
On p-value
Bootstrapping time series
Quantile Autoregression in R
My Own favourite:
How Important is Variable Selection?
The Importance of being inauthentic
Presenting properly is important. Here is how I think it should look like,
Design quotes
Slides 18 and 30 are especially nice:
Quantify your jogging
Numbers are useful (I think we can all agree on that..). If you own a smart phone, you can install this runmeter app. When you run, you can take the smartphone with you and activate this app to collect interesting numbers like distance, pace, fastest pace, heart rate*, calories etc. Now we can load the statistics collected over the past months into R and have a quantified look at the progress.
R and Dropbox
When you woRk, you probably have a set of useful functions/packages you constantly use. For example, I often use the excellent quantmod package, and the nice multi.sapply function. You want your tools loaded when R session fires.
Elegant backup for your smart phone contacts
Few days ago I dropped my iPhone and cracked it. Though the iPhone still works, I decided it will be good to have a backup for my contact on my desktop. Fancy backup can be achieved in the following two step procedure: first synching your contacts information with facebook, and second, sending yourself an excel file with full details of your mobile contacts, phone number, date of birth, home page, work address and other details extracted from their facebook page. The process takes only few minutes and is free.
Behavioral Economics in Action
Do doctors unnecessarily prolong Colonoscopy? the answer is: they surely might.
Random books
It seems like a very long while since my bachelor. Checking my bookshelf the other day I was thinking to flag some of those books which helped or inspired me along the way. Here they are in no particular order.
Marriage is good for your income
For those of you who are into machine learning, here you can find a cool collection of databases to play around with your favorite algorithm. I choose one out of the available 200 and fit a logistic regression model. The idea is to see what kind of properties are common for those who earn above 50K a year. Our data is such that the “y” variable is binary. A value of 1 is given if the individual earns above 50K and 0 if below. We know many things about the individual. Level of education in years, age, is she married, where from, which sector is she working in, how many working hours per week, race, and more. We can fit logistic regression, which is quite standard for a binary dependent variable, and see which variables are important.