In a previous post: Most popular machine learning R packages, trying to hash out what are the most frequently used machine learning packages, I simply chose few names from my own memory. However, there is a CRAN task views web page which “aims to provide some guidance which packages on CRAN are relevant for tasks related to a certain topic.” So instead of relying on my own experience, in this post I correct for the bias by simply looking at the topic
Machine Learning & Statistical Learning. There are currently around 100 of those packages on CRAN.
Using the cranlogs
library I query the number of downloads for all those packages over the 2018 period.
Of course the assumption here is that the number of downloads is a good proxy for (unobserved) popularity, but I think that is a fairly weak assumption.
Here is the result:
Most popular machine learning R packages:
These are the total number of downloads (divided by 10^4 for readability).
At the top of the list, most names look familiar, but not all. There are few undiscovered (by me at least) potentially powerful packages.
Action points: to check
– John Fox, one of those R titans, wrote the effects
package. Effect Displays for Linear, Generalized Linear, and Other Models.
– arules
package provides the infrastructure for representing, manipulating and analyzing transaction data and patterns (frequent itemsets and association rules).
– plotmo
package Plot a Model’s Residuals, Response, and Partial Dependence Plots.
Code
The code I used to construct the data is below. Using the CRAN Task Views you can do the same with other tasks you care about, e.g. extreme value estimation, or handling missing data.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 |
pack_list <- pack_list <- c("ahaz,arules,BART,bartMachine,BayesTree,biglasso,bmrm,Boruta,bst,C50,caret,CORElearn,CoxBoost,Cubist,deepnet,e1071,earth,effects,elasticnet,ElemStatLearn,evclass,evtree,frbs,GAMBoost,gamboostLSS,gbm,ggRandomForests,glmnet,glmpath,GMMBoost,gradDescent,grf,grplasso,grpreg,h2o,hda,hdi,hdm,ICEbox,ipred,kernlab,klaR,lars,lasso2,LiblineaR,LogicRe,LTRCtrees,maptree,mboost,mlr,model4you,MXM,ncvreg,nnet,oem,OneR,opusminer,pamr,party,partykit,pdp,penalized,penalizedLDA,picasso,plotmo,quantregForest,randomForest,randomForestSRC,ranger,rattle,Rborist,RcppDL,rdetools,REEMtree,relaxo,rgenoud,RLT,Rmalschains,rminer,rnn,ROC,RoughSets,rpart,RPMM,RSNNS,RWeka,RXshrink,sda,SIS,spa,stabs,SuperLearner,svmpath,tensorflow,tgp,tree,trtf,varSelRF,vcrpart,wsrf,xgboost") Pack_list <- unlist(strsplit(pack_list, ",")) # install.packages("cranlogs") library(cranlogs) tmpp <- cran_downloads(packages = Pack_list, from = "2018-01-01", to = "2019-01-01") pack_name <- unique(tmpp$package) dat <- as.data.table(tmpp) library(magrittr) num_downloads <- NULL for (i in seq_along(pack_name) ) { num_downloads[i] <- dat[ package==pack_name[i] , count ] %>% sum } names(num_downloads) <- pack_name |
2 comments on “Most popular machine learning R packages – part 2”