In scientific programming speed is important. Functions written for general public use have a lot of control-flow checks which are not necessary if you are confident enough with your code.To quicken your code execution I suggest to strip run-of-the-mill functions to their bare bones. You can save serious wall-clock time by using only the laborers code lines. Below is a walk-through example of what I mean.

I use the `quantile`

function for the example. There are many ways to compute the estimate of a quantile, and all those various ways are coded into the one `quantile`

function. The function has the default argument `type = 7`

which indicates the particular way we wish to estimate our quantiles. Given that *R* is an open-source language you can easily find the code for any function, then you can “fish out” only the lines that you actually need. While the code for the `quantile`

function is around 90 lines (given below), the real labor is carried out mainly by lines 49 to 58 – the main workhorse (for the type=7 default).

Now, let’s write our own version of the `quantile`

function; call it `lean_quantile`

. Then we make sure our `lean_quantile`

does what its meant to do, and compare the execution time.

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 |
lean_quantile <- function(x, probs = seq(0, 1, 0.25)) { n <- length(x) np <- length(probs) index <- 1 + (n - 1) * probs lo <- floor(index) hi <- ceiling(index) x <- sort(x, partial = unique(c(lo, hi))) qs <- x[lo] i <- which(index > lo) h <- (index - lo)[i] qs[i] <- (1 - h) * qs[i] + h * x[hi[i]] qs } |

Check that our `lean_quantile`

does what its meant to do:

1 2 3 4 5 |
tmpp <- rnorm(10) all( quantile(tmpp) == lean_quantile(tmpp) ) [1] TRUE |

Now we can compare the execution time (more on timing and profiling code):

1 2 3 4 5 6 7 8 9 10 11 12 |
library(microbenchmark) # citation("microbenchmark") bench <- microbenchmark(quantile(rnorm(10)), lean_quantile(rnorm(10)), times=10^4) bench # Unit: microseconds # expr min lq mean median uq max neval # quantile(rnorm(10)) 79.1 84.3 96.3 86.1 89.0 3907 10000 # lean_quantile(rnorm(10)) 27.9 31.6 36.2 33.4 34.8 4741 10000 |

Execution time is reduced by over 60%. Also, we did not have to work very hard for it. We can do more, diving further and improve the `sort`

function which our `lean_quantile`

uses, but you get the idea.

Is it a free lunch? Of course not.

It takes long to master efficient programming, and the functions you find in the public domain are probably well scrutinized – before and after they go up there. When you mingle with the internals you risk making a mistake, erasing an important line or creating unintended consequences and messing up the original behavior. So meticulous checks are good to do.

While some functions are written so efficiently that you will find very little value in pulling out just the workhorse, with most functions written for the general public you will certainly be able to squeeze out some time-profit. As you can see this “get the gist” tip has excellent potential to save a lot of waiting time.

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 |
# Part of the R package, https://www.R-project.org # # Copyright (C) 1995-2014 The R Core Team # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License as published by # the Free Software Foundation; either version 2 of the License, or # (at your option) any later version. # # This program is distributed in the hope that it will be useful, # but WITHOUT ANY WARRANTY; without even the implied warranty of # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the # GNU General Public License for more details. # # A copy of the GNU General Public License is available at # https://www.R-project.org/Licenses/ quantile <- function(x, ...) UseMethod("quantile") quantile.POSIXt <- function(x, ...) .POSIXct(quantile(unclass(as.POSIXct(x)), ...), attr(x, "tzone")) quantile.default <- function(x, probs = seq(0, 1, 0.25), na.rm = FALSE, names = TRUE, type = 7, ...) { if(is.factor(x)) { if(!is.ordered(x) || ! type %in% c(1L, 3L)) stop("factors are not allowed") lx <- levels(x) } else lx <- NULL if (na.rm) x <- x[!is.na(x)] else if (anyNA(x)) stop("missing values and NaN's not allowed if 'na.rm' is FALSE") eps <- 100*.Machine$double.eps if (any((p.ok <- !is.na(probs)) & (probs < -eps | probs > 1+eps))) stop("'probs' outside [0,1]") n <- length(x) if(na.p <- any(!p.ok)) { # set aside NA & NaN o.pr <- probs probs <- probs[p.ok] probs <- pmax(0, pmin(1, probs)) # allow for slight overshoot } np <- length(probs) if (n > 0 && np > 0) { if(type == 7) { # be completely back-compatible index <- 1 + (n - 1) * probs lo <- floor(index) hi <- ceiling(index) x <- sort(x, partial = unique(c(lo, hi))) qs <- x[lo] i <- which(index > lo) h <- (index - lo)[i] # > 0 by construction ## qs[i] <- qs[i] + .minus(x[hi[i]], x[lo[i]]) * (index[i] - lo[i]) ## qs[i] <- ifelse(h == 0, qs[i], (1 - h) * qs[i] + h * x[hi[i]]) qs[i] <- (1 - h) * qs[i] + h * x[hi[i]] } else { if (type <= 3) { ## Types 1, 2 and 3 are discontinuous sample qs. nppm <- if (type == 3) n * probs - .5 # n * probs + m; m = -0.5 else n * probs # m = 0 j <- floor(nppm) h <- switch(type, (nppm > j), # type 1 ((nppm > j) + 1)/2, # type 2 (nppm != j) | ((j %% 2L) == 1L)) # type 3 } else { ## Types 4 through 9 are continuous sample qs. switch(type - 3, {a <- 0; b <- 1}, # type 4 a <- b <- 0.5, # type 5 a <- b <- 0, # type 6 a <- b <- 1, # type 7 (unused here) a <- b <- 1 / 3, # type 8 a <- b <- 3 / 8) # type 9 ## need to watch for rounding errors here fuzz <- 4 * .Machine$double.eps nppm <- a + probs * (n + 1 - a - b) # n*probs + m j <- floor(nppm + fuzz) # m = a + probs*(1 - a - b) h <- nppm - j if(any(sml <- abs(h) < fuzz)) h[sml] <- 0 } x <- sort(x, partial = unique(c(1, j[j>0L & j<=n], (j+1)[j>0L & j<n], n)) ) x <- c(x[1L], x[1L], x, x[n], x[n]) ## h can be zero or one (types 1 to 3), and infinities matter #### qs <- (1 - h) * x[j + 2] + h * x[j + 3] ## also h*x might be invalid ... e.g. Dates and ordered factors qs <- x[j+2L] qs[h == 1] <- x[j+3L][h == 1] other <- (0 < h) & (h < 1) if(any(other)) qs[other] <- ((1-h)*x[j+2L] + h*x[j+3L])[other] } } else { qs <- rep(NA_real_, np) } if(is.character(lx)) qs <- factor(qs, levels = seq_along(lx), labels = lx, ordered = TRUE) if(names && np > 0L) { names(qs) <- format_perc(probs) } if(na.p) { # do this more elegantly (?!) o.pr[p.ok] <- qs names(o.pr) <- rep("", length(o.pr)) # suppress <NA> names names(o.pr)[p.ok] <- names(qs) o.pr } else qs } |

### Footnotes

As a side note, would be nice to do that in Python also, but the source code for the numpy quantile function is heavily “decorated”. Comment if you know how to create the Python counterpart.

Note that the code you have removed is mostly error-checking for incorrect arguments and out-of-range values, and correct handling of NAs. So your fast version is less robust than the library version and should only be used when you are certain that the arguments are correct.

I agree wholeheartedly. It is by no means a free lunch. When you mingle with the internals you risk making a mistake, erasing an important line or creating unintended consequences and messing up the original behavior. It is indeed useful only if you are, very, comfortable with your code base and coding skills.

Now it would be nice to get a review of methods of finding function source codes. It is not always as easy as for quantile().

You are right. If I encounter “primitive” functions, or functions which are sent to C++ or Fortran I usually let it go. For getting the source code for functions which are written as methods you can check here.