R tips and tricks – Timing and profiling code

Modern statistical methods use simulations; generating different scenarios and repeating those thousands of times over. Therefore, even trivial operations burden computational speed.

In the words of my favorite statistician Bradley Efron:

“There is some sort of law working here, whereby statistical methodology always expands to strain the current limits of computation.”

In addition to the need for faster computation, the richness of open-source ecosystem means that you often encounter different functions doing the same thing, sometimes even under the same name. This post explains how to measure the computational efficacy of a function so you know which one to use, with a couple of actual examples for reducing computational time.

Continue reading

Good coding practices – part 2

Introduction

In part 1 of Good coding practices we considered how best to code for someone else, may it be a colleague who is coming from Excel environment and is unfamiliar with scripting, a collaborator, a client or the future-you, the you few months from now. In this second part, I give some of my thoughts on how best to write functions, the do’s and dont’s.

Continue reading

Good coding practices – part 1

Introduction

At work, I recently spent a lot of time coding for someone else, and like anything else you do, there is much to learn from it. It also got me thinking about scripting, and how best to go about it. To me it seems that the new working generation mostly tries to escape from working with Excel, but “let’s not kid ourselves: the most widely used piece of software for statistics is Excel” (Brian D. Ripley). this quote is 15 years old almost, but Excel still has a strong hold on the industry.

Here I discuss few good coding practices. Coding for someone else is not to be taken literally here. ‘Someone else’ is not necessarily a colleague, it could just as easily be the “future you”, the you reading your code six months from now (if you are lucky to get responsive referees). Did it never happened to you that your past-self was unduly cruel to your future-self? that you went back to some old code snippets and dearly regretted not adding few comments here and there? Of course it did.

Unlike the usual metric on which “good” is usually measured by when it comes to coding: good = efficient, here the metric would be different: good = friendly. They call this literate programming. There is a fairly deep discussion about this paradigm by John D. cook (follow what he has to say if you are not yet doing it, there is something for everyone).

Continue reading

Show yourself (look “under the hood” of a function in R)

Open source software has many virtues. Being free is not the least of which. However, open source comes with “ABSOLUTELY NO WARRANTY” and with no power comes no responsibility (I wonder..). Since no one is paying, by definition it is your sole responsibility to make sure the code does what it is supposed to be doing. Thus, looking “under the hood” of a function written by someone else is can be of service. There are more reasons to examine the actual underlying code.

Continue reading