R tips and tricks – get the gist

In scientific programming speed is important. Functions written for general public use have a lot of control-flow checks which are not necessary if you are confident enough with your code.To quicken your code execution I suggest to strip run-of-the-mill functions to their bare bones. You can save serious wall-clock time by using only the laborers code lines. Below is a walk-through example of what I mean.

I use the `quantile` function for the example. There are many ways to compute the estimate of a quantile, and all those various ways are coded into the one `quantile` function. The function has the default argument `type = 7` which indicates the particular way we wish to estimate our quantiles. Given that R is an open-source language you can easily find the code for any function, then you can “fish out” only the lines that you actually need. While the code for the `quantile` function is around 90 lines (given below), the real labor is carried out mainly by lines 49 to 58 – the main workhorse (for the type=7 default).

Now, let’s write our own version of the `quantile` function; call it `lean_quantile`. Then we make sure our `lean_quantile` does what its meant to do, and compare the execution time.

Check that our `lean_quantile` does what its meant to do:

Now we can compare the execution time (more on timing and profiling code):

Execution time is reduced by over 60%. Also, we did not have to work very hard for it. We can do more, diving further and improve the `sort` function which our `lean_quantile` uses, but you get the idea.

Is it a free lunch? Of course not.

It takes long to master efficient programming, and the functions you find in the public domain are probably well scrutinized – before and after they go up there. When you mingle with the internals you risk making a mistake, erasing an important line or creating unintended consequences and messing up the original behavior. So meticulous checks are good to do.

While some functions are written so efficiently that you will find very little value in pulling out just the workhorse, with most functions written for the general public you will certainly be able to squeeze out some time-profit. As you can see this “get the gist” tip has excellent potential to save a lot of waiting time.

Footnotes

As a side note, would be nice to do that in Python also, but the source code for the numpy quantile function is heavily “decorated”. Comment if you know how to create the Python counterpart.

6 comments on “R tips and tricks – get the gist”

1. Kent Johnson says:

Note that the code you have removed is mostly error-checking for incorrect arguments and out-of-range values, and correct handling of NAs. So your fast version is less robust than the library version and should only be used when you are certain that the arguments are correct.

1. I agree wholeheartedly. It is by no means a free lunch. When you mingle with the internals you risk making a mistake, erasing an important line or creating unintended consequences and messing up the original behavior. It is indeed useful only if you are, very, comfortable with your code base and coding skills.

2. Now it would be nice to get a review of methods of finding function source codes. It is not always as easy as for quantile().

1. You are right. If I encounter “primitive” functions, or functions which are sent to C++ or Fortran I usually let it go. For getting the source code for functions which are written as methods you can check here.