The R language has some quirks compared to other languages. One thing which you need to constantly watch for when moving to- or from R, is that R starts its indexing at one, while almost all other languages start indexing at zero, which takes some getting used to. Another quirk is the explicit need for clarity when modifying a variable, compared with other languages.
Take python for example, but I think it looks the same in most common languages:
1 2 3 4 5 6 7 8 9 10 |
count = 1 while count < 5: print(count) count += 1 1 2 3 4 |
Notice the very elegant and super clear
count += 1
,
which says that count is going to be increased by 1. Now look at the R way:
1 2 3 4 5 6 7 8 9 10 11 |
count <- 1 while(count < 5) { print(count) count <- count + 1 } [1] 1 [1] 2 [1] 3 [1] 4 |
You need to be clear about “running over” the variable count
with the much lengthier
count <- count + 1
.
I did not care before, but after tasting other languages I am now annoyed by this, like the ticking of a clock which went unnoticed until someone kindly directed your attention to that annoying ticking sound.
Can we do something about it? In short, not really. But perhaps an ugly workaround.
You can create a function to run over the it's own argument. Note that this goes against the R philosophy. Generally speaking the advice is: "Don't". But we, also, like to live dangerously..
We can use the assign()
function to modify a function's argument like so:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 |
"+1" <- function(x) { assign( deparse( substitute(x) ) , x + 1, envir = .GlobalEnv) } # or "+=" <- function(x, a) { assign( deparse(substitute(x)) , x + a, envir = .GlobalEnv) } # if you want to increase by a rather by 1 count <- 1 while(count < 5) { print(count) "+1"(count) } [1] 1 [1] 2 [1] 3 [1] 4 |
This is ugly, but understandable, and closer to what you may be used to, coming from other languages.
A pipe-based alternative
I recently discovered the interesting and useful operator %<>%
from the magrittr
package in R. That operator applied to 'x' means: update 'x' and run over it. So the following also does the trick:
1 2 3 4 5 6 7 8 9 10 11 |
> count <- 1 > while(count < 5) { + print(count) + count %<>% +1 + } [1] 1 [1] 2 [1] 3 [1] 4 |
This looks slightly better. However, every time I use a piping operator I recall what I wrote in the past. Particularly regarding speed of execution when using pipes. So let's have a look at that:
1 2 3 4 5 6 7 8 9 10 |
library(microbenchmark) citation("microbenchmark") count <- 1 microbenchmark(count %<>% +1, "+1"(count), unit= "ms") Unit: milliseconds expr min lq mean median uq max neval count %<>% +1 0.078489 0.081809 0.08745984 0.0839220 0.086790 0.195012 100 `+1`(count) 0.008453 0.009962 0.01241394 0.0123775 0.013586 0.038942 100 |
You can see it was worth checking. Using the operator %<>%
instead of our own "+1"
function is very costly. Eight times slower.
Finally. let's look at the speed compared to what you would do by default in R, which is to explicitly specify an increase in the count:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 |
z1 <- expression( count <- 1 , while(count < 5) { print(count) "+1"(count) } ) z2 <- expression( count <- 1 , while(count < 5) { print(count) count <- count + 1 } ) microbenchmark(eval(z1), eval(z2), unit= "ms") Unit: milliseconds expr min lq mean median uq max neval eval(z1) 3.582039 3.681053 3.824822 3.750938 3.919535 4.641919 100 eval(z2) 3.767692 3.934025 4.279643 4.035907 4.310764 10.034297 100 |
Using the "+1"
function allows for comparable readability and slightly better speed (~5% faster), without any additional package installations. May you find the assign
function useful also in other contexts.