R tips and tricks – the pipe operator

The R language has improved over the years. Amidst numerous splendid augmentations, the magrittr package by Stefan Milton Bache allows us to write more readable code. It uses an ingenious piping convention which will be explained shortly. This post talks about when to use those pipes, and when to avoid using pipes in your code. I am all about that bass readability, but I am also about speed. Use the pipe operator, but watch the tradeoff.

Piping convention example

We first get some data – say a matrix of daily return for some sectors’ ETFs:

Now we would like to do the following
1. Remove NA’s
2. Get the rankings in each row
3. Transpose the result so that we each column represents a ticker.
Here is the code for those steps without using pipes:

  • t( apply( apply (timeseries, 2, na.omit), 1, rank) )
  • And this is how the same operations look like using the pipe operator %>% (make sure the library magrittr is loaded):

  • timeseries %>% apply(2, na.omit) %>% apply(1, rank) %>% t()
  • You can check that the two ways return exactly the same result using the function all.equal.

    Readability-wise, using pipes makes it much more easy to understand the code. On the object timeseries, first do (1) remove NA then (2) rank then (3) transpose.
    Without pipes, the code is well.. ugly. At first read, you really need to be quite experienced R user to understand what is happening.

    The piping convention makes a huge difference for R newcomers.

    Now, what about speed? Is making your code more readable slows it down? Not in this case. The microbenchmark library conveniently time our operations:

    What this code is doing is running the same operation 100 times and each time measures how long it took to complete. Looking at the median of those 100, differences are negligible. So using pipes we gain readability without losing any efficiency. But that is not always the case.

    Using pipes can considerably slow down your code

    Don’t use pipes blindly if you care about speed. On many occasions it can materially slow down your code.

    For example, when we created the data we converted it to a matrix. But there are many ways to mold your data into a comfortable format. Say we would use the unlist function to convert from list to a numeric vector. In that case readability gains from using the pipe operator carries a steep cost in terms of speed. Let’s compare
    retd %>% lapply(unlist) with lapply(retd, unlist):

    Log of the time (milliseconds), 100 repetitions

    timing the pipe operator
    As you can see, using pipes is decisively slower.

    Main point

    Use pipes when speed is a non issue. When speed is important, dive deeper to see if you need to sacrifice readability for speed.
    Pipes meme

    Leave a Reply

    Your email address will not be published. Required fields are marked *