In this post about the most popular machine learning R packages I showed the incredible- exponential growth displayed by R software, measured by the number of package downloads. Here is another graph which shows a more linear growth in R (and an impressive growth in python) as measured by % of question posted in stack overflow
This is more an Rstudio tip than an R tip. It would be nice to know how the following works for different editors, but Rstudio is common enough and awesome enough for the following to be relevant.
Insert or bind?
This is the first in a series of planned posts, sharing some R tips and tricks. I hope to cover topics which are not easily found elsewhere. This post has to do with loops in R. There are two ways to save values when looping:
1. You can predefine a vector and fill it, or
2. you can recursively bind the values.
Which one is faster?
In part 1 of Good coding practices we considered how best to code for someone else, may it be a colleague who is coming from Excel environment and is unfamiliar with scripting, a collaborator, a client or the future-you, the you few months from now. In this second part, I give some of my thoughts on how best to write functions, the do’s and dont’s.
At work, I recently spent a lot of time coding for someone else, and like anything else you do, there is much to learn from it. It also got me thinking about scripting, and how best to go about it. To me it seems that the new working generation mostly tries to escape from working with Excel, but “let’s not kid ourselves: the most widely used piece of software for statistics is Excel” (Brian D. Ripley). this quote is 15 years old almost, but Excel still has a strong hold on the industry.
Here I discuss few good coding practices. Coding for someone else is not to be taken literally here. ‘Someone else’ is not necessarily a colleague, it could just as easily be the “future you”, the you reading your code six months from now (if you are lucky to get responsive referees). Did it never happened to you that your past-self was unduly cruel to your future-self? that you went back to some old code snippets and dearly regretted not adding few comments here and there? Of course it did.
Unlike the usual metric on which “good” is usually measured by when it comes to coding: good = efficient, here the metric would be different: good = friendly. They call this literate programming. There is a fairly deep discussion about this paradigm by John D. cook (follow what he has to say if you are not yet doing it, there is something for everyone).
Few weeks back I gave a talk in the R/Finance 2016 conference, about forecast combinations in R. Here are the slides:
The good thing about using open-source software is the community around it. There are very many R packages online, and recently CRAN package download logs were released. This means we can have a look at the number of downloads for each package, so to get a good feel for their relative popularity. I pulled the log files from the server and checked a few packages which are known to be related to machine learning. With this post you can see which are the community favorites, and get a feel for the R-software trend growth.
Especially in economics/econometrics, modellers do not believe their models reflect reality as it is. No, the yield curve does NOT follow a three factor Nelson-Siegel model, the relation between a stock and its underlying factors is NOT linear, and volatility does NOT follow a Garch(1,1) process, nor Garch(?,?) for that matter. We simply look at the world, and try to find an apt description of what we see.
Open source software has many virtues. Being free is not the least of which. However, open source comes with “ABSOLUTELY NO WARRANTY” and with no power comes no responsibility (I wonder..). Since no one is paying, by definition it is your sole responsibility to make sure the code does what it is supposed to be doing. Thus, looking “under the hood” of a function written by someone else is can be of service. There are more reasons to examine the actual underlying code.
In April this year, Rstudio notified early users of shiny that Glimmer and Spark servers which host interactive-applications would be decommissioned. Basically, the company is moving forward to generate revenues from this great interactive application service. For us aspirants who use the service strictly as a hobby, that means, in a word: pay.
Basic subscription now costs around 40$ per month. Keeping your applications free of charge is possible BUT, as long as it is not used for more than 25 hours per month. So if your site generate some traffic, most users would simply not be able to access the app. Apart from that, you are subject to some built-in Rstudio’s logo which can’t be removed without having a paid subscription. That is a shame, but a company’s gotta eat right? I am using Rstudio’s services from their very beginning, and the company definitely deserve to eat! only I wish there would be another step between the monthly 0$ option which provides too slim capabilities, and the monthly 40$ option which is, in my admittedly biased opinion, too pricey for a ‘sometimes’ hobby.
Diversity is a real strength. By now it is common knowledge. I often see institutions openly encourage multinational environment and multidisciplinary professionals, with specific “on-the-job” training to tailor for own needs. No one knows a lot about a lot, so bringing different together enhance independent thinking and knowledge available to the organization. Clarity of communication then becomes even more important, and making sure your figures are quickly understandable goes a long way.
One of my Ph.D papers was published recently. It deals with yield curve forecasting.
Here is the code for applying the Nelson-Siegel model to any yield curve.
Package Eplot on cran.
At least for me, R by faR. MATLAB has its own way of doing things, which to be honest can probably be defended from many angles. Here are few examples for not so subtle differences between R and MATLAB: