The annual useR! conference

This year on 4th of July I will be attending the annual usrR! conference. While it is often in the US, this year the UseR! conference takes place in the nearby Brussels. Sweet.

The website is state-of-the-art “don’t make me think” style. The program looks amazing. Belgian beers with the R community, exciting. Registration still open.

Watch this space for highlights and afterthoughts.

Random Books

It seems like a very long while since my bachelor. Checking my bookshelf the other day I was thinking to flag some of those books which helped or inspired me along the way. Here they are in no particular order.


R tips and tricks – the locator function

How many times have you placed the legend in R plot to discover it is being overrun by some points or lines in the chart? Usually what comes next is a trial-and-error phase where you adjust the location, changing the arguments of the x and y coordinates, and re-drawing the plot again to check if the legend or text are now positioned such that they are fully readable.


Good coding practices – part 1


At work, I recently spent a lot of time coding for someone else, and like anything else you do, there is much to learn from it. It also got me thinking about scripting, and how best to go about it. To me it seems that the new working generation mostly tries to escape from working with Excel, but “let’s not kid ourselves: the most widely used piece of software for statistics is Excel” (Brian D. Ripley). this quote is 15 years old almost, but Excel still has a strong hold on the industry.

Here I discuss few good coding practices. Coding for someone else is not to be taken literally here. ‘Someone else’ is not necessarily a colleague, it could just as easily be the “future you”, the you reading your code six months from now (if you are lucky to get responsive referees). Did it never happened to you that your past-self was unduly cruel to your future-self? that you went back to some old code snippets and dearly regretted not adding few comments here and there? Of course it did.

Unlike the usual metric on which “good” is usually measured by when it comes to coding: good = efficient, here the metric would be different: good = friendly. They call this literate programming. There is a fairly deep discussion about this paradigm by John D. cook (follow what he has to say if you are not yet doing it, there is something for everyone).


Most popular machine learning R packages

The good thing about using open-source software is the community around it. There are very many R packages online, and recently CRAN package download logs were released. This means we can have a look at the number of downloads for each package, so to get a good feel for their relative popularity. I pulled the log files from the server and checked a few packages which are known to be related to machine learning. With this post you can see which are the community favorites, and get a feel for the R-software trend growth.


ASA statement on p-values

There are many problems with p-values, and I too have chipped in at times. I recently sat in a presentation of an excellent paper, to be submitted to the highest ranked journal in the field. The authors did not conceal their ruthless search for those mesmerizing asterisks indicating significance. I was curious to see many in the crowd are not aware of current history in the making regarding those asterisks.

The web is now swarming with thought-provoking discussions about the recent American Statistical Association (ASA) statement on p-values. Despite their sincere efforts, there are still a lot of back-and-forth over what they actually mean. Here is how I read it.


Most popular posts – 2015

The top three for the year are:
Out-of-sample data snooping
Code for my yield curve forecasting paper
Review of a couple of books
I personally enjoyed the most writing a few words on ML estimation, and about those great statistical discoveries. Since the last post did not involve any code or images I initially thought it would be a breeze. I in fact spent twice the time I usually do, and it was all good fun.

In 2015 I wrote quite a bit about volatility and correlation. In 2016 I plan to learn more (so to write more) about portfolio construction.

Present-day great statistical discoveries

Some time during the 18th century the biologist and geologist Louis Agassiz said: “Every great scientific truth goes through three stages. First, people say it conflicts with the Bible. Next they say it has been discovered before. Lastly they say they always believed it”. Nowadays I am not sure about the Bible but yeah, it happens.

I express here my long-standing and long-lasting admiration for the following triplet of present-day great discoveries. The authors of all three papers had initially struggled to advance their ideas, which echos the quote above. Here they are, in no particular order.


How regression statistics mislead experts

This post concerns a paper I came across checking the nominations for best paper published in International Journal of Forecasting (IJF) for 2012-2013. The paper bears the annoyingly irresistible title: “The illusion of predictability: How regression statistics mislead experts”, and was written by Soyer Emre and Robin Hogarth (henceforth S&H). The paper resonates another paper published in “Psychological review” (1973), by Daniel Kahneman and Amos Tversky: “On the psychology of prediction”. Despite the fact that S&H do not cite the 1973 paper, I find it highly related.


Dark background theme for reading

I am picking up on Rob Hyndman’s suggestion on dark themes for writing. I carried on with a bit of “internet-scientific” reading. Opinions on ‘dark vs white’ background themes, which is better for your eyes, are mixed. You are busy, so just the bottom line: do what works for you.