R tips and tricks – boxplots for large data

Admit it, you always thought there is something off with how boxplot look like. You can tell there should be some way in which more information can be depicted, they simply look much too spacious. Evidently you are not the only one. Many have tried to suggest better ways to plot the same information. Here on 40 years of boxplots.

I found the following paper: Letter-Value Plots: Boxplots for Large Data Heike Hofmann, Hadley Wickham and Karen Kafadar Journal of Computational and Graphical Statistics Vol. 26, Iss. 3,2017. This paper presents the improvement we did not know we want. They propose new ways to boxplot, with better look and with more information given. In my opinion, in a few months\years this new way of boxplotting described in the paper would become the new boxplot standard. The old boxplot would finally perish.

Boxplotting in this new way, you get much more bang for you buck (information to ink ratio). I replicated few figures from the paper: lvplot function

Who else but Hadley? (answer: Heike Hofmann and Karen Kafadar).

Generating those charts is as easy as pie:

One word of warning. Those charts are meant to be used when you have a lot of data. The chart is based on some quantile estimates. Those quantile estimates are not reliable if you have only few data points. If you have only few data points then you better dust out those soon to be expired charts from the 70’s.

Spread the new boxplot word.

Exploratory Data Analysis

