On Writing

Each year I supervise several data-science master’s students, and each year I find myself repeating the same advises. Situation has worsen since students started (mis)using GPT models. I therefore have written this blog post to highlight few important, and often overlooked, aspects of thesis-writing. Many points apply also to writing in general.

Continue reading

Rython tips and tricks – Clipboard

For whatever reason, clipboard functionalities from Rython are under-utilized. One utility function for reversing backslashes is found here. This post demonstrates how you can use the clipboard to circumvent saving and loading files. It’s convenient for when you just want the quick insight or visual, rather than a full-blown replicable process.

Continue reading

On Writing Math

There are a lot of examples for skills that despite being greatly needed, we never get any formal training for. At least nothing is built into our core educational programs. Few examples are: how to read well, how to listen well, or how to develop your can-do mental attitude. Writing well, in particular math-writing, is another such example. Here I share few pointers from my own experience of reading and writing math.

Continue reading

R tips and tricks – Timing and profiling code

Modern statistical methods use simulations; generating different scenarios and repeating those thousands of times over. Therefore, even trivial operations burden computational speed.

In the words of my favorite statistician Bradley Efron:

“There is some sort of law working here, whereby statistical methodology always expands to strain the current limits of computation.”

In addition to the need for faster computation, the richness of open-source ecosystem means that you often encounter different functions doing the same thing, sometimes even under the same name. This post explains how to measure the computational efficacy of a function so you know which one to use, with a couple of actual examples for reducing computational time.

Continue reading

R tips and tricks, on-screen colors

I like using Rlogo for many reasons. Two of those are (1) easy integration with almost whichever software you can think of, and (2) for its graphical powers. Color-wise, I dare to assume you probably plotted, re-specified your colors, plotted again, and iterated until you found what works for your specific chart. Here you can find modern visualization so you are able to quickly find the colors you look for, and to quickly see how it looks on screen. See below for quick demo.

Continue reading

Kaggle Experience

At least in part, a typical data-scientist is busy with forecasting and prediction. Kaggle is a platform which hosts a slew of competitions. Those who have the time, energy and know-how to combat real-life problems, are huddling together to test their talent. I highly recommend this experience. A side effect of tackling actual problems (rather than those which appear in textbooks), is that most of the time you are not at all enjoying new wonderful insights or exploring fascinating unfamiliar, ground-breaking algos. Rather, you are handling\wrangling\manipulating data, which is.. ugly and boring, but necessary and useful.

I tried my powers few years ago, and again about 6 months ago in one of those competitions called Toxic Comment Classification Challenge. Here are my thoughts on that short experience and some insight from scraping the results of that competition.

Continue reading

Bitcoin exponential growth

Is bitcoin a bubble? I don’t know. What defines a bubble? The price should drastically overestimate the underlying fundamentals. I simply don’t know much about blockchain to have an opinion there. A related characteristic is a run-away price. Going up fast just because it is going up fast.

Continue reading

The annual useR! conference

This year on 4th of July I will be attending the annual usrR! conference. While it is often in the US, this year the UseR! conference takes place in the nearby Brussels. Sweet.

The website is state-of-the-art “don’t make me think” style. The program looks amazing. Belgian beers with the R community, exciting. Registration still open.

Watch this space for highlights and afterthoughts.

Live volatility monitor

In April this year, Rstudio notified early users of shiny that Glimmer and Spark servers which host interactive-applications would be decommissioned. Basically, the company is moving forward to generate revenues from this great interactive application service. For us aspirants who use the service strictly as a hobby, that means, in a word: pay.

Basic subscription now costs around 40$ per month. Keeping your applications free of charge is possible BUT, as long as it is not used for more than 25 hours per month. So if your site generate some traffic, most users would simply not be able to access the app. Apart from that, you are subject to some built-in Rstudio’s logo which can’t be removed without having a paid subscription. That is a shame, but a company’s gotta eat right? I am using Rstudio’s services from their very beginning, and the company definitely deserve to eat! only I wish there would be another step between the monthly 0$ option which provides too slim capabilities, and the monthly 40$ option which is, in my admittedly biased opinion, too pricey for a ‘sometimes’ hobby.

Continue reading