Multivariate summary function in R

Some time ago, I wrote a Better summary function in R . Here is its multivariate extension:

summary(x)
msumm = function(y){ # multivariate summary
	usumm = function(x,h = 2){ # h is the number of values to print in the head and tail functions.
	if (!require(moments)) {
		stop("The function requires the moments package. To install it, run 'install.packages(\"moments\")'.\n")
	}
a1 = suppressWarnings(data.frame(min = min(x, na.rm = T), med = median(x, na.rm = T), mean = mean(x,na.rm = T),
max = max(x, na.rm = T), sd = stats::sd(x, na.rm = T), skew = skewness(x, na.rm = T),kurt = kurtosis(x,na.rm = T))    )
headh = head(x,h)  ; tailh = tail(x,h) 
naat =  which(is.na(x))
if( length(naat) == 0)  naat= c("No NA's in the series")
missing = ifelse( length(naat) == 0,"No","Yes")
l = list(summary.stat = a1, na.at = naat, Head = headh, Tail = tailh, Length = length(x),
missing = missing) 
return(l)
}
l1 = apply(y,2,usumm)
stats = NULL ; missing = NULL
for (i in 1:length(l1)){
	stats = rbind(stats,l1[[i]]$summary.stat)
	missing = cbind(missing,l1[[i]]$missing)
}
stats = cbind(stats, missing = t(missing))
rownames(stats) <- names(l1)
print(stats)
return(l1)
}

Generate some factitious data and see the results. Results are pretty much self explanatory. The last column indicates if the variable has missing values or not.

d = data.frame(rnormal = rnorm(10), 
mat = matrix(c(1:8,NA,10),nrow = 10, ncol = 1) )
m = msumm(d)
#### Here is the print:
#              min          med       mean        max        sd       skew     kurt missing
# rnormal -1.671465 -0.001411399 -0.1643362  0.9381707 0.7642856 -0.5742236 2.742984      No
# mat      1.000000  5.000000000  5.1111111 10.0000000 2.9344695  0.1995093 2.000760     Yes
names(m) # the names of the variables in the matrix/data.frame:
# [1] "rnormal" "mat"  

#### Zoom in on the "mat" variable:
m$mat
# $summary.stat
# min med     mean max       sd      skew    kurt
# 1   1   5 5.111111  10 2.934469 0.1995093 2.00076
# 
# $na.at    # indicates the location of the NA values:
# [1] 9    
# 
# $head.x
# [1] 1 2
# 
# $tail.x
# [1] NA 10
# 
# $length.x
# [1] 10
# 
# $missing  
# [1] "Yes"

Note:
1. This function uses "Kurtosis" and "Skewness" functions, both can be found in package "e1071" or package "moments" so you need at least one of those packages to avoid errors.
2. I use "suppressWarnings" since the function "sd" produces some meaningless warning which I want to ignore.
3. The function is designed to handle numeric data. It is straight forward to extend it to other class types. (In fact, I have no idea if it's "straight forward" but it is common (bad) practice to phrase it as such when you have no time to actually do it. Another alternative is: "for the sake of brevity I refer the interested reader to... and skip it here".)
Related:
The Pragmatic Programmer: Your Journey To Mastery
Coders at Work: Reflections on the Craft of Programming
Code Complete: A Practical Handbook of Software Construction

You might also like:

Leave a Reply

Your email address will not be published. Required fields are marked *