R Packages Download Stats

One big advantage of using open-source tools is the fantastic ecosystems that typically accompany them. Being able to tap into a massive open-source community, by way of downloading freely available code is decidedly useful. But, yes, there are downsides to downloads.

For one, there are too many packages out there. There are imperfect duplicates. You can easily end up downloading inferior code/package/module compared to existing other. Second, there is a matter of security. I myself try to refrain from downloading relatively new code, not yet tried-and-true. How do we know if a package is solid?

Continue reading

R tips and tricks – shell.exec

When you startup your machine, the first thing you do is to open the various programs you work with. Examples: your note-taking program, the pdf file that you need to read, the ppt file you were last working on, and of course your strongest link with the outside world nowadays; your email box. This post shows how to automate this process. Windows machines notoriously need restarting for every little (un)install. I trust you will find this startup automation advice handy.

Continue reading

R tips and tricks – readClipboard

Here is a small utility function to save you some boring work.

Say you have a file to read into R. The file path is C:\Users\folder1\folder2\folder3\mydata.csv. So what do you do? you copy the path, paste it to the editor, and start reversing the backslash into a forward slash so that R can read your file.

With the help of the rstudioapi package, the readClipboard function and the following function:

You can
1. Simply copy the path C:\Users\folder1\folder2\folder3\mydata.csv
2. execute pathh <- get_path()
3. use pathh which is now R-ready.

No more reversing or escaping backslash.

R tips and tricks – Timing and profiling code

Modern statistical methods use simulations; generating different scenarios and repeating those thousands of times over. Therefore, even trivial operations burden computational speed.

In the words of my favorite statistician Bradley Efron:

“There is some sort of law working here, whereby statistical methodology always expands to strain the current limits of computation.”

In addition to the need for faster computation, the richness of open-source ecosystem means that you often encounter different functions doing the same thing, sometimes even under the same name. This post explains how to measure the computational efficacy of a function so you know which one to use, with a couple of actual examples for reducing computational time.

Continue reading

R + Python = Rython

Enough! Enough with that pointless R versus Python debate. I find it almost as pointless as the Bayesian vs Frequentist “dispute”. I advocate here what I advocated there (“..don’t be a Bayesian, nor be a Frequenist, be opportunist“).

Nowadays even marginally tedious computation is being sent to faster, minimum-overhead languages like C++. So it’s mainly syntax administration we insist to insist on. What does it matter if we have this:

Or that

Continue reading

R tips and tricks, on-screen colors

I like using Rlogo for many reasons. Two of those are (1) easy integration with almost whichever software you can think of, and (2) for its graphical powers. Color-wise, I dare to assume you probably plotted, re-specified your colors, plotted again, and iterated until you found what works for your specific chart. Here you can find modern visualization so you are able to quickly find the colors you look for, and to quickly see how it looks on screen. See below for quick demo.

Continue reading

R Journal publication

The R Journal is the open access, refereed journal of the R project for statistical computing. It features short to medium length articles covering topics that should be of interest to users or developers of R.

Christoph Weiss, Gernot Roetzer and myself have joined forces to write an R package and the accompanied paper: Forecast Combinations in R using the ForecastComb Package, which is now published in the R journal. Below you can find a few of my thoughts about the journey towards publication in the R journal, and a few words about working with a small team of three, from three different locations.

Continue reading

R tips and tricks – the locator function

How many times have you placed the legend in R plot to discover it is being overrun by some points or lines in the chart? Usually what comes next is a trial-and-error phase where you adjust the location, changing the arguments of the x and y coordinates, and re-drawing the plot again to check if the legend or text are now positioned such that they are fully readable.

Continue reading

Show yourself (look “under the hood” of a function in R)

Open source software has many virtues. Being free is not the least of which. However, open source comes with “ABSOLUTELY NO WARRANTY” and with no power comes no responsibility (I wonder..). Since no one is paying, by definition it is your sole responsibility to make sure the code does what it is supposed to be doing. Thus, looking “under the hood” of a function written by someone else is can be of service. There are more reasons to examine the actual underlying code.

Continue reading

Adding text to R plot

Diversity is a real strength. By now it is common knowledge. I often see institutions openly encourage multinational environment and multidisciplinary professionals, with specific “on-the-job” training to tailor for own needs. No one knows a lot about a lot, so bringing different together enhance independent thinking and knowledge available to the organization. Clarity of communication then becomes even more important, and making sure your figures are quickly understandable goes a long way.

Continue reading