Some time ago, I wrote a Better summary function in R . Here is its multivariate extension:
summary(x)
msumm = function(y){ # multivariate summary
usumm = function(x,h = 2){ # h is the number of values to print in the head and tail functions.
if (!require(moments)) {
stop("The function requires the moments package. To install it, run 'install.packages(\"moments\")'.\n")
}
a1 = suppressWarnings(data.frame(min = min(x, na.rm = T), med = median(x, na.rm = T), mean = mean(x,na.rm = T),
max = max(x, na.rm = T), sd = stats::sd(x, na.rm = T), skew = skewness(x, na.rm = T),kurt = kurtosis(x,na.rm = T)) )
headh = head(x,h) ; tailh = tail(x,h)
naat = which(is.na(x))
if( length(naat) == 0) naat= c("No NA's in the series")
missing = ifelse( length(naat) == 0,"No","Yes")
l = list(summary.stat = a1, na.at = naat, Head = headh, Tail = tailh, Length = length(x),
missing = missing)
return(l)
}
l1 = apply(y,2,usumm)
stats = NULL ; missing = NULL
for (i in 1:length(l1)){
stats = rbind(stats,l1[[i]]$summary.stat)
missing = cbind(missing,l1[[i]]$missing)
}
stats = cbind(stats, missing = t(missing))
rownames(stats) <- names(l1)
print(stats)
return(l1)
}
Generate some factitious data and see the results. Results are pretty much self explanatory. The last column indicates if the variable has missing values or not.
d = data.frame(rnormal = rnorm(10),
mat = matrix(c(1:8,NA,10),nrow = 10, ncol = 1) )
m = msumm(d)
#### Here is the print:
# min med mean max sd skew kurt missing
# rnormal -1.671465 -0.001411399 -0.1643362 0.9381707 0.7642856 -0.5742236 2.742984 No
# mat 1.000000 5.000000000 5.1111111 10.0000000 2.9344695 0.1995093 2.000760 Yes
names(m) # the names of the variables in the matrix/data.frame:
# [1] "rnormal" "mat"
#### Zoom in on the "mat" variable:
m$mat
# $summary.stat
# min med mean max sd skew kurt
# 1 1 5 5.111111 10 2.934469 0.1995093 2.00076
#
# $na.at # indicates the location of the NA values:
# [1] 9
#
# $head.x
# [1] 1 2
#
# $tail.x
# [1] NA 10
#
# $length.x
# [1] 10
#
# $missing
# [1] "Yes"
Note:
1. This function uses "Kurtosis" and "Skewness" functions, both can be found in package "e1071" or package "moments" so you need at least one of those packages to avoid errors.
2. I use "suppressWarnings" since the function "sd" produces some meaningless warning which I want to ignore.
3. The function is designed to handle numeric data. It is straight forward to extend it to other class types. (In fact, I have no idea if it's "straight forward" but it is common (bad) practice to phrase it as such when you have no time to actually do it. Another alternative is: "for the sake of brevity I refer the interested reader to... and skip it here".)
Related:
The Pragmatic Programmer: Your Journey To Mastery
Coders at Work: Reflections on the Craft of Programming
Code Complete: A Practical Handbook of Software Construction








