Nonstandard errors?

Nonstandard errors is the title given to a recent published paper in the prestigious Journal of Finance by more than 350 authors. At first glance the paper appears to mix apples and oranges. At second glance, it still looks that way. To be clear, the paper is mostly what you expect from a top journal: stimulating, thought-provoking and impressive. However, my main reservation is with its conclusion and recommendations which are off the mark, I think.

I begin with a brief explanation about the content and some results from the paper, and then I share my own interpretation and perspective for what it’s worth.

What are nonstandard errors?

Say you hire two research teams to test the efficacy of a drug. You provide them with the same data. Later the two teams return with their results. Each team reports their estimated probability that the drug is effective, and the (standard) standard error for their estimate. But, since the two teams made different decisions along the way (e.g. how to normalize the data) their estimates are different. So there is additional (nonstandard) error because their estimates are not identical, despite being asked the exact same question and being given the exact same data. As the authors write: this “type of error can be thought of as erratic as opposed to erroneous”. So that is simply extra variation stemming from the teams’ distinct analytical choices (e.g. how to treat outliers, how to impute missing values).

Things I love about the paper

  • Exceptional clarity, and phenomenal design-thinking.
  • The logistical orchestration of bringing together over 350 people in a structured way is really not something to be jealous of. I can only imagine the headache it gives. This elevates the paper to have remarkable power. Both as an example that such large scale collaboration is actually possible, and of course the valuable data and evidence.
  • On the content side, the paper brilliantly alerts the readers to be aware that results of any research are highly dependent on the decision path chosen by the research team (e.g. which model, which optimization algorithm, which frequency to choose). Results and decision-path go beyond basic dependency – there’s a profound reliance at play. This is true for theoretical work (“under assumptions 1-5…”), you can double the force for empirical studies, and in my view you can triple the force for empirical social sciences work. Below is the point estimate and distribution around 6 different hypotheses which 164 research teams were asked to test (again, using the same data). Setting aside the hypotheses’ details for now, you can see below that there is a sizable variation around the point estimates. dispersion of estimates
    Not only the extent of the variation is eyebrows-raising, but in most cases there is not even an agreement on the sign…

    The paper dives deeper. Few more insights are that if we check only top research teams (setting aside now how “top” is actually determined) situation is a bit better. Also, when you asked the researchers what is their estimate for the across-teams variation they tend to underestimate it.

    What you see is that most research teams underestimate the actual variation (black dots under the big orange dot) and that is true for all 6 hypotheses tested. This very much echos Deniel Kahneman work: “We are prone to overestimate how much we understand about the world”.

  • What is the main contributor for the dispersion of estimates? You guessed it, the statistical model chosen by the researchers.

Things I don’t like about the paper

The authors claim that the extra decision-path induced variation adds uncertainty, and that this is undesirable. Because of that a better approach, the claim, would be to perfectly aligned on the decision-path.

6 months ago I made a linkedin comment about the paper based on a short 2-minutes video.

Yes, it took 6 months but I now feel after reading it through that my flat “shooting from the hip” comment is still valid (although I regret the language I chose).

In the main, any research paper is, and if not then it should be, read as a stand-alone input for our overall understanding. I think it’s clear to everyone that what they read is true, conditional on what they read was done.

It’s not that I don’t mind to read that a certain hypothesis is true if, say, checked using daily frequency but is reversed if checked using monthly frequency, I WANT to read that. Then I want to read why they made the decision they made, and to make up my own mind and relate it to what I need it for in my own context.

Do we want to dictate a single procedure for each hypothesis? It is certainly appealing. We would have an easier time pursuing the truth, one work (where the decision path is decided upon) for one hypothesis, and we will have no uncertainty and no across-researchers variation. But the big BUT is this, even in the words of the authors of the same paper: “there simply is no right path in an absolute sense”. The move to a fully-aligned single procedure boils down to a risk transfer. Rather than having a risk of a researchers taking wrong turns on their decision paths (or even p-hacking), we now carry another, higher risk in my opinion, that our aligned procedure is wrong for all researchers. So, the uncertainty is still there, but now under the rag. That is even more worrisome than the across-researchers variation we CAN observe.

While I commend the scientific pursuit for truth, there isn’t always one truth to uncover. Everything is a process. In the past stuttering was treated by placing pebbles in the mouth. More recently (and maybe even still) university courses in economics excluded negative interest rates on the ground that everyone would hold cash. When time came, it turns out that there are not enough mattresses.

Across-researchers variation is actually something you want. If it’s small it means the problem is not hard enough (everyone agrees on how to check it). So, should we just ignore across-researchers variation? also not. Going back to my opening point, the paper brilliantly captures the scale of this variation. Just be ultra aware that two research-teams are not checking one thing (even if working on the same data and testing for the same hypothesis), but they are checking two things. The same hypothesis but based on particular analytical choices which they made. We have it harder in that we need to consume more research outputs, but that is a small price compared to the alternative.

Footnote

While reading the paper I thought it would be good to sometimes report a trimmed standard deviations, because of the sensitivity of that measure to outliers.

On Writing

Each year I supervise several data-science master’s students, and each year I find myself repeating the same advises. Situation has worsen since students started (mis)using GPT models. I therefore have written this blog post to highlight few important, and often overlooked, aspects of thesis-writing. Many points apply also to writing in general.

Continue reading

Rython tips and tricks – Clipboard

For whatever reason, clipboard functionalities from Rython are under-utilized. One utility function for reversing backslashes is found here. This post demonstrates how you can use the clipboard to circumvent saving and loading files. It’s convenient for when you just want the quick insight or visual, rather than a full-blown replicable process.

Continue reading

Rython tips and tricks – Snippets

R or Python? who cares! Which editor? now that’s a different story.

I like Rstudio for many reasons. Outside the personal, Rstudio allows you to write both R + Python = Rython in the same script. Apart from that, the editor’s level of complexity is well-balanced, not functionality-overkill like some, nor too simplistic like some others. This post shares how to save time with snippets (easy in Rstudio). Snippets save time by reducing the amount of typing required, it’s the most convenient way to program copy-pasting into the machine’s memory.

In addition to the useful built-ins snippets provided by Rstudio like lib or fun for R and imp or def for Python, you can write your own snippets. Below are a couple I wrote myself that you might find helpful. But first we start with how to use snippets.

Continue reading

On Writing Math

There are a lot of examples for skills that despite being greatly needed, we never get any formal training for. At least nothing is built into our core educational programs. Few examples are: how to read well, how to listen well, or how to develop your can-do mental attitude. Writing well, in particular math-writing, is another such example. Here I share few pointers from my own experience of reading and writing math.

Continue reading

R Packages Download Stats

One big advantage of using open-source tools is the fantastic ecosystems that typically accompany them. Being able to tap into a massive open-source community, by way of downloading freely available code is decidedly useful. But, yes, there are downsides to downloads.

For one, there are too many packages out there. There are imperfect duplicates. You can easily end up downloading inferior code/package/module compared to existing other. Second, there is a matter of security. I myself try to refrain from downloading relatively new code, not yet tried-and-true. How do we know if a package is solid?

Continue reading

Asking Good Questions

Recently, I was lucky enough to speak at the 7th International conference on Time Series and Forecasting (ITISE). The conference itself had excellent collection of talks with a applications in completely different fields. Energy, neuroscience and, how can we not, a great deal of COVID19-related forecasting papers. It was a mix of online and in-person presentations, and with a slew of technical hiccups consuming a lot of valuable minutes time was of the essence. Very few minutes, if any, for questions. I attended my first conference well over a decade ago, and my strong feeling is that things have not changed much since. There is simply not enough training when it comes to the way slides should (and should not) look like, how to deliver a 20 minutes talk about a paper which took a year to draft, and indeed, which questions are good and which are just expensive folly.

Continue reading

R tips and tricks – shell.exec

When you startup your machine, the first thing you do is to open the various programs you work with. Examples: your note-taking program, the pdf file that you need to read, the ppt file you were last working on, and of course your strongest link with the outside world nowadays; your email box. This post shows how to automate this process. Windows machines notoriously need restarting for every little (un)install. I trust you will find this startup automation advice handy.

Continue reading

R tips and tricks – Timing and profiling code

Modern statistical methods use simulations; generating different scenarios and repeating those thousands of times over. Therefore, even trivial operations burden computational speed.

In the words of my favorite statistician Bradley Efron:

“There is some sort of law working here, whereby statistical methodology always expands to strain the current limits of computation.”

In addition to the need for faster computation, the richness of open-source ecosystem means that you often encounter different functions doing the same thing, sometimes even under the same name. This post explains how to measure the computational efficacy of a function so you know which one to use, with a couple of actual examples for reducing computational time.

Continue reading

R + Python = Rython

Enough! Enough with that pointless R versus Python debate. I find it almost as pointless as the Bayesian vs Frequentist “dispute”. I advocate here what I advocated there (“..don’t be a Bayesian, nor be a Frequenist, be opportunist“).

Nowadays even marginally tedious computation is being sent to faster, minimum-overhead languages like C++. So it’s mainly syntax administration we insist to insist on. What does it matter if we have this:

Or that

Continue reading

R tips and tricks, on-screen colors

I like using Rlogo for many reasons. Two of those are (1) easy integration with almost whichever software you can think of, and (2) for its graphical powers. Color-wise, I dare to assume you probably plotted, re-specified your colors, plotted again, and iterated until you found what works for your specific chart. Here you can find modern visualization so you are able to quickly find the colors you look for, and to quickly see how it looks on screen. See below for quick demo.

Continue reading

The Distribution of the Sample Maximum

Where I work we are now hiring. We took few time-consuming actions to make sure we have a large pool of candidates to choose from. But what is the value in having a large pool of candidates? Intuitively, the more candidates you have the better the chance that you will end up with a strong prospective candidate in terms of experience, talent and skill set (call this one candidate “the maximum”). But what are we talking about? is this meaningful? If there is a big difference between 10 candidates versus 1500 candidates, but very little difference between 10 candidates versus 80 candidates it means that our publicity and screening efforts are not very fruitful\efficient. Perhaps it would be better running quickly over a small pool, few dozens candidates, and choose the best fit. Below I try to cast this question in terms of the distribution of the sample maximum (think: how much better is the best candidate as the number of candidates grow).

Continue reading