There are a lot of examples for skills that despite being greatly needed, we never get any formal training for. At least nothing is built into our core educational programs. Few examples are: how to read well, how to listen well, or how to develop your can-do mental attitude. Writing well, in particular math-writing, is another such example. Here I share few pointers from my own experience of reading and writing math.
Category: Miscellaneous
R Packages Download Stats
One big advantage of using open-source tools is the fantastic ecosystems that typically accompany them. Being able to tap into a massive open-source community, by way of downloading freely available code is decidedly useful. But, yes, there are downsides to downloads.
For one, there are too many packages out there. There are imperfect duplicates. You can easily end up downloading inferior code/package/module compared to existing other. Second, there is a matter of security. I myself try to refrain from downloading relatively new code, not yet tried-and-true. How do we know if a package is solid?
Asking Good Questions
Recently, I was lucky enough to speak at the 7th International conference on Time Series and Forecasting (ITISE). The conference itself had excellent collection of talks with a applications in completely different fields. Energy, neuroscience and, how can we not, a great deal of COVID19-related forecasting papers. It was a mix of online and in-person presentations, and with a slew of technical hiccups consuming a lot of valuable minutes time was of the essence. Very few minutes, if any, for questions. I attended my first conference well over a decade ago, and my strong feeling is that things have not changed much since. There is simply not enough training when it comes to the way slides should (and should not) look like, how to deliver a 20 minutes talk about a paper which took a year to draft, and indeed, which questions are good and which are just expensive folly.
R tips and tricks – shell.exec
When you startup your machine, the first thing you do is to open the various programs you work with. Examples: your note-taking program, the pdf file that you need to read, the ppt file you were last working on, and of course your strongest link with the outside world nowadays; your email box. This post shows how to automate this process. Windows machines notoriously need restarting for every little (un)install. I trust you will find this startup automation advice handy.
R tips and tricks – Timing and profiling code
Modern statistical methods use simulations; generating different scenarios and repeating those thousands of times over. Therefore, even trivial operations burden computational speed.
In the words of my favorite statistician Bradley Efron:
“There is some sort of law working here, whereby statistical methodology always expands to strain the current limits of computation.”
In addition to the need for faster computation, the richness of open-source ecosystem means that you often encounter different functions doing the same thing, sometimes even under the same name. This post explains how to measure the computational efficacy of a function so you know which one to use, with a couple of actual examples for reducing computational time.
R + Python = Rython
Enough! Enough with that pointless R versus Python debate. I find it almost as pointless as the Bayesian vs Frequentist “dispute”. I advocate here what I advocated there (“..don’t be a Bayesian, nor be a Frequenist, be opportunist“).
Nowadays even marginally tedious computation is being sent to faster, minimum-overhead languages like C++. So it’s mainly syntax administration we insist to insist on. What does it matter if we have this:
1 2 3 |
xsquare <- function(x){ x^2 } |
Or that
1 2 3 4 |
def xsquare(x): return x**2 |
R tips and tricks, on-screen colors
I like using for many reasons. Two of those are (1) easy integration with almost whichever software you can think of, and (2) for its graphical powers. Color-wise, I dare to assume you probably plotted, re-specified your colors, plotted again, and iterated until you found what works for your specific chart. Here you can find modern visualization so you are able to quickly find the colors you look for, and to quickly see how it looks on screen. See below for quick demo.
Machine learning is simply statistics – part 2
Another opinion piece.
If you can’t explain it simply you don’t understand it well enough.
(Albert Einstein)
Forecast Combination in R – slides
The useR! 2019 held in Toulouse ended couple of days ago.
The Distribution of the Sample Maximum
Where I work we are now hiring. We took few time-consuming actions to make sure we have a large pool of candidates to choose from. But what is the value in having a large pool of candidates? Intuitively, the more candidates you have the better the chance that you will end up with a strong prospective candidate in terms of experience, talent and skill set (call this one candidate “the maximum”). But what are we talking about? is this meaningful? If there is a big difference between 10 candidates versus 1500 candidates, but very little difference between 10 candidates versus 80 candidates it means that our publicity and screening efforts are not very fruitful\efficient. Perhaps it would be better running quickly over a small pool, few dozens candidates, and choose the best fit. Below I try to cast this question in terms of the distribution of the sample maximum (think: how much better is the best candidate as the number of candidates grow).
Matrix-style screensaver in R
This post shares short code snippet to make your own screen saver in R, The Matrix-style:
Visualizing Time series Data
This post has two goals. I hope to make you think about your graphics, and think about the future of data-visualization. An example is given using some simulated time series data. A very quick read.
Kaggle Experience
At least in part, a typical data-scientist is busy with forecasting and prediction. Kaggle is a platform which hosts a slew of competitions. Those who have the time, energy and know-how to combat real-life problems, are huddling together to test their talent. I highly recommend this experience. A side effect of tackling actual problems (rather than those which appear in textbooks), is that most of the time you are not at all enjoying new wonderful insights or exploring fascinating unfamiliar, ground-breaking algos. Rather, you are handling\wrangling\manipulating data, which is.. ugly and boring, but necessary and useful.
I tried my powers few years ago, and again about 6 months ago in one of those competitions called Toxic Comment Classification Challenge. Here are my thoughts on that short experience and some insight from scraping the results of that competition.
R tips and tricks – the assign() function
The R language has some quirks compared to other languages. One thing which you need to constantly watch for when moving to- or from R, is that R starts its indexing at one, while almost all other languages start indexing at zero, which takes some getting used to. Another quirk is the explicit need for clarity when modifying a variable, compared with other languages.
Take python for example, but I think it looks the same in most common languages:
The annual useR! conference
This year on 4th of July I will be attending the annual usrR! conference. While it is often in the US, this year the UseR! conference takes place in the nearby Brussels. Sweet.
The website is state-of-the-art “don’t make me think” style. The program looks amazing. Belgian beers with the R community, exciting. Registration still open.
Watch this space for highlights and afterthoughts.