The good thing about using open-source software is the community around it. There are very many R packages online, and recently CRAN package download logs were released. This means we can have a look at the number of downloads for each package, so to get a good feel for their relative popularity. I pulled the log files from the server and checked a few packages which are known to be related to machine learning. With this post you can see which are the community favorites, and get a feel for the R-software trend growth.
There are many problems with p-values, and I too have chipped in at times. I recently sat in a presentation of an excellent paper, to be submitted to the highest ranked journal in the field. The authors did not conceal their ruthless search for those mesmerizing asterisks indicating significance. I was curious to see many in the crowd are not aware of current history in the making regarding those asterisks.
The web is now swarming with thought-provoking discussions about the recent American Statistical Association (ASA) statement on p-values. Despite their sincere efforts, there are still a lot of back-and-forth over what they actually mean. Here is how I read it.
The top three for the year are:
Out-of-sample data snooping
Code for my yield curve forecasting paper
Review of a couple of books
I personally enjoyed the most writing a few words on ML estimation, and about those great statistical discoveries. Since the last post did not involve any code or images I initially thought it would be a breeze. I in fact spent twice the time I usually do, and it was all good fun.
In 2015 I wrote quite a bit about volatility and correlation. In 2016 I plan to learn more (so to write more) about portfolio construction.
Some time during the 18th century the biologist and geologist Louis Agassiz said: “Every great scientific truth goes through three stages. First, people say it conflicts with the Bible. Next they say it has been discovered before. Lastly they say they always believed it”. Nowadays I am not sure about the Bible but yeah, it happens.
I express here my long-standing and long-lasting admiration for the following triplet of present-day great discoveries. The authors of all three papers had initially struggled to advance their ideas, which echos the quote above. Here they are, in no particular order.
Perhaps it is the different jargon used in different disciplines, not sure. But for some reason, the terms ‘predictions’, ‘forecasts’ and ‘projections’ are frequently used interchangeably.
I have recently reviewed couple of books.
This post concerns a paper I came across checking the nominations for best paper published in International Journal of Forecasting (IJF) for 2012-2013. The paper bears the annoyingly irresistible title: “The illusion of predictability: How regression statistics mislead experts”, and was written by Soyer Emre and Robin Hogarth (henceforth S&H). The paper resonates another paper published in “Psychological review” (1973), by Daniel Kahneman and Amos Tversky: “On the psychology of prediction”. Despite the fact that S&H do not cite the 1973 paper, I find it highly related.
At least for me, R by faR. MATLAB has its own way of doing things, which to be honest can probably be defended from many angles. Here are few examples for not so subtle differences between R and MATLAB:
We are all standing on the shoulders of giants. Bradley Efron is one such giant. With the invention of the bootstrap in 1979 and later with his very influential 2004 paper about the Least Angle Regression (and the accompanied software written in R).
I just finished reading An estimate of the science-wise false discovery rate and application to the top medical literature. The authors ask how many of what we read is scientific journals is actually incorrect, or false.
Presenting properly is important. Here is how I think it should look like,
Slides 18 and 30 are especially nice: