The summary function in R returns:
1 2 3 4 5 |
summary(x) Min. 1st Qu. Median Mean 3rd Qu. Max. 9.14 10.70 11.10 11.30 12.10 13.60 |
For the univariate case I wrote what I consider to be a better summary function which returns:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 |
usum(x) # For univariate Summary Summary Statistics: min med mean max sd skew kurt 1 9.14 11.13 11.35 13.65 1.057 0.3028 -0.6389 ------ NA's?: No NA's in the series ------ Head Tail 13.65 13.55 13.08 13.13 ------ Length Class 207 numeric |
The ideas behind it are:
1. I don’t really care about the quartiles, I can always create the box-plot or the histogram.
2. I think the standard deviation, Skewness and Kurtosis are interesting.
3. I want to know not only if I have NA’s but also where are they. It is somewhat less disturbing to have two consecutive NA’s at the end of series, than at random locations. e.g.:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 |
usum(c(x,NA)) # will return: Summary Statistics: min med mean max sd skew kurt 1 9.14 11.13 11.35 13.65 1.057 0.3028 -0.6389 ------ NA's?: 208 # the location of your NA's ------ Head Tail 13.65 13.55 13.13 NA ------ Length Class 208 numeric |
4. I always want to know the length and at least one of “head” or “tail” to double check I load the series correctly as I intend and not different series by mistake.
5. Visualize: the function has a “plot” argument which, when set to “TRUE” returns a (2,1) figure with a standard scatter plot in the upper half and a histogram on the lower half.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 |
function(x, plot1 = T, h = 2){ # The h is how many head/tail you want to see if(class(x) == "factor") stop( "class factor not supported, use standard summary function") class.x = class(x) if(plot1){ par(mfrow = c(2,1)) plot(x, ty = "b", lwd = 1.5, main = "Plot") hist(x, breaks = length(x)/4) # Nice detailed breaks } a1 = suppressWarnings(data.frame(min = min(x, na.rm = T), med = median(x, na.rm = T), mean = mean(x,na.rm = T), max = max(x, na.rm = T), sd = stats::sd(x, na.rm = T), skew = skewness(x, na.rm = T),kurt = kurtosis(x,na.rm = T)) ) headh = head(x,h) ; tailh = tail(x,h) a2 = which(is.na(x)) if( length(a2) == 0) a2 = c("No NA's in the series") l = list(summary.stat = a1, na.at = a2, head.x = headh, tail.x = tailh, length.x = length(x), class.x = class.x) # now we format the output: cat("Summary Statistics:","\n") print(l$summary.stat) cat("------","\n") cat("NA's?:","\n") cat(l$na.at,"\n") cat("------","\n") cat("Head Tail","\n") cat(l$head.x," ",l$tail.x,"\n") cat("------","\n") cat("Length"," ","Class","\n") cat(l$length.x," ",l$class.x,"\n") } |
Next is a multivariate extension, till then we loop. Let me know if bugs are spotted or you are familiar with (or made changes yourself to create) something better. Thanks.
Note:
1. This function uses “Kurtosis” and “Skewness” functions, both can be found in package “e1071” or package “moments” so you need at least one of those packages to avoid errors.
2. I use “suppressWarnings” since the function “sd” produces some meaningless warning which I want to ignore.
3. Alter the function for your own needs.
Related:
[asa mytpl]020161622X[/asa]