Better summary function in R

The summary function in R returns:

summary(x)
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max.
   9.14   10.70   11.10   11.30   12.10   13.60

For the univariate case I wrote what I consider to be a better summary function which returns:

usum(x) # For univariate Summary
Summary Statistics:
   min   med  mean   max    sd   skew    kurt
1 9.14 11.13 11.35 13.65 1.057 0.3028 -0.6389
------
NA's?:
No NA's in the series
------
Head                   Tail
13.65 13.55            13.08 13.13
------
Length           Class
207            numeric


The ideas behind it are:
1. I don’t really care about the quartiles, I can always create the box-plot or the histogram.
2. I think the standard deviation, Skewness and Kurtosis are interesting.
3. I want to know not only if I have NA’s but also where are they. It is somewhat less disturbing to have two consecutive NA’s at the end of series, than at random locations. e.g.:

usum(c(x,NA)) # will return:
Summary Statistics:
   min   med  mean   max    sd   skew    kurt
1 9.14 11.13 11.35 13.65 1.057 0.3028 -0.6389
------
NA's?:
208 # the location of your NA's
------
Head                   Tail
13.65 13.55            13.13 NA
------
Length           Class
208            numeric

4. I always want to know the length and at least one of “head” or “tail” to double check I load the series correctly as I intend and not different series by mistake.
5. Visualize: the function has a “plot” argument which, when set to “TRUE” returns a (2,1) figure with a standard scatter plot in the upper half and a histogram on the lower half.

function(x, plot1 = T, h = 2){
# The h is how many head/tail you want to see
	if(class(x) == "factor") stop( "class factor not supported, use standard summary function")
class.x = class(x)
	if(plot1){
par(mfrow = c(2,1))
plot(x, ty = "b", lwd = 1.5, main = "Plot")
hist(x, breaks = length(x)/4) # Nice detailed breaks
	}
a1 = suppressWarnings(data.frame(min = min(x, na.rm = T), med = median(x, na.rm = T), mean = mean(x,na.rm = T),
max = max(x, na.rm = T), sd = stats::sd(x, na.rm = T), skew = skewness(x, na.rm = T),kurt = kurtosis(x,na.rm = T))    )
	headh = head(x,h)  ; tailh = tail(x,h)
	a2 =  which(is.na(x))
	if( length(a2) == 0)  a2 = c("No NA's in the series")
	l = list(summary.stat = a1, na.at = a2, head.x = headh, tail.x = tailh, length.x = length(x),
class.x = class.x)
# now we format the output:
cat("Summary Statistics:","\n")
 print(l$summary.stat)
cat("------","\n")
cat("NA's?:","\n")
cat(l$na.at,"\n")
cat("------","\n")
cat("Head                   Tail","\n")
cat(l$head.x,"          ",l$tail.x,"\n")
cat("------","\n")
cat("Length","         ","Class","\n")
cat(l$length.x,"          ",l$class.x,"\n")
	}

Next is a multivariate extension, till then we loop. Let me know if bugs are spotted or you are familiar with (or made changes yourself to create) something better. Thanks.
Note:
1. This function uses “Kurtosis” and “Skewness” functions, both can be found in package “e1071” or package “moments” so you need at least one of those packages to avoid errors.
2. I use “suppressWarnings” since the function “sd” produces some meaningless warning which I want to ignore.
3. Alter the function for your own needs.
Related:
[asa mytpl]020161622X[/asa]

You might also like:

Leave a Reply

Your email address will not be published. Required fields are marked *