In this post I’ll look at different statistical hypothesis tests in R. Statistical tests can be tricky because they all have different assumptions that must be met before you can use them. Some tests require samples to be normally distributed, others require two samples to have the same variance, while others are not as restrictive.
We’ll begin with testing for normality. Then we’ll look at testing for equality of variance, with and without an assumption of normality. Finally we’ll look at testing for equality of mean, under different assumptions regarding normality and equal variance.
Testing for Normality
A number of tests assume normality. We can ascertain normality by visual inspection using a Q-Q plot, or we can use the Shapiro-Wilk test. Here is code and a figure for producing Q-Q plots.
x <- rnorm( 100, mean=0, sd=1 ) qqnorm( x )
Shapiro-Wilk Test
When using the Shapiro-Wilk test, it is important to recall that the null hypothesis the that the sample is normal. If you get a p-value below your predefined significance level , then you may reject the null hypothesis that the sample is normally distributed.
shapiro.test( x )
This produces the following output,
Shapiro-Wilk normality test data: x W = 0.9822, p-value = 0.1964
Testing for Equal Variance for Normal Samples
Assuming that we know that our samples are normally distributed, we can perform either the F-test for two samples, or the Bartlett test for two or more samples. Note that the samples do not need to be the same size to perform these tests.
F-Test
The F-test looks at the ratio of the variances of the two samples. If the ratio is near one, then the two samples have the same variance, but if the ratio is significantly greater or lesser than one, then the two samples have unequal variance. The null hypothesis is that the variances are equal.
x <- rnorm( 100, mean=0, sd=1 ) y <- rnorm( 85, mean=1, sd=2 ) var.test( x, y )
F test to compare two variances data: x and y F = 0.2702, num df = 99, denom df = 84, p-value = 9.03e-10 alternative hypothesis: true ratio of variances is not equal to 1 95 percent confidence interval: 0.1779818 0.4071762 sample estimates: ratio of variances 0.2701598
Bartlett’s Test
Bartlett’s test takes two or more samples. The null hypothesis is that all of the samples have the same variance. Here, the samples y
and z
have the same variance, but that the test, which tests all of the samples, rejects the null hypothesis that all of the samples have equal variance.
x <- rnorm( 100, mean=0, sd=1 ) y <- rnorm( 85, mean=1, sd=2 ) z <- rnorm( 75, mean=2, sd=2 ) bartlett.test( list( x, y, z ) )
Here is the output of the code above. Note that since the p-value is very small, we may reject the null hypothesis that all of the samples have equal variance.
Bartlett test of homogeneity of variances data: list(x, y, z) Bartlett's K-squared = 35.0872, df = 2, p-value = 2.404e-08
Alternatively, if all of the samples live in the same data frame, with sample values in one column and class labels in another, then we can use the following syntax: bartlett.test( sample.values ~ class.labels, data.frame )
. We’ll consider the InsectSprays
data set that ships with R as an example.
data(InsectSprays) head(InsectSprays)
count spray 1 10 A 2 7 A 3 20 A 4 14 A 5 14 A 6 12 A
bartlett.test( count ~ spray, InsectSprays )
Bartlett test of homogeneity of variances data: count by spray Bartlett's K-squared = 25.9598, df = 5, p-value = 9.085e-05
Another way to do this would have been to look at the aggregate()
function, which has a similar syntax. The aggregate()
function aggregates a numerical value according to a categorical value, using some (aggregate) function, like mean()
or var()
. It’s like group-by in SQL or pandas.
aggregate( count ~ spray, InsectSprays, var )
spray count 1 A 22.272727 2 B 18.242424 3 C 3.901515 4 D 6.265152 5 E 3.000000 6 F 38.606061
Testing for Equal Variance for Non-Normal Samples
We have three non-parametric test for equal variance: the Ansari-Bradley and Mood tests for two samples, and the Fligner-Killeen test for multiple samples.
Ansari-Bradley Test
The null hypothesis is that the variances, or scale parameter are equal.
x <- rbeta( 100, 1, 3 ) y <- rbeta( 100, 1, 4 ) ansari.test( x, y )
Ansari-Bradley test data: x and y AB = 4808, p-value = 0.237 alternative hypothesis: true ratio of scales is not equal to 1
Mood Test
Again, the null hypothesis is that the variances, or scale parameter, are equal.
x <- rbeta( 100, 1, 3 ) y <- rbeta( 100, 1, 4 ) mood.test( x, y )
Mood two-sample test of scale data: x and y Z = 1.0599, p-value = 0.2892 alternative hypothesis: two.sided
Fligner-Killeen Test
Like the syntax of the Bartlett test, the Fligner-Killeen test takes either a list of vectors, or a formula, i.e., the data and the class labels delimited by a tilde. The null hypothesis is that all of the variances are equal.
x <- rbeta( 20, 1, 2 ) y <- rbeta( 30, 1, 3 ) fligner.test( list( x, y ) )
Fligner-Killeen test of homogeneity of variances data: list(x, y) Fligner-Killeen:med chi-squared = 1.9883, df = 1, p-value = 0.1585
data(ChickWeight) fligner.test( weight ~ Diet, ChickWeight )
Fligner-Killeen test of homogeneity of variances data: weight by Diet Fligner-Killeen:med chi-squared = 30.5146, df = 3, p-value = 1.076e-06
Testing for Equal Means for Normal Samples
There are two flavors of the t-test: one for samples with equal variance, and another called Welch’s test for samples with unequal variance.
t-Test
Assuming we have normal samples, with equal variances, we can use the t-test.
x <- rnorm( 10, 1, 1 ) y <- rnorm( 10, 2, 1 ) t.test( x, y, var.equal=TRUE )
Two Sample t-test data: x and y t = -3.3745, df = 18, p-value = 0.003377 alternative hypothesis: true difference in means is not equal to 0 95 percent confidence interval: -2.8067299 -0.6528291 sample estimates: mean of x mean of y 0.753695 2.483474
Welch’s Two Sample t-Test
If the samples are normal, but the variances are not equal, then we can use Welch’s test.
x <- rnorm( 10, 1, 1 ) y <- rnorm( 10, 2, 2 ) t.test( x, y, var.equal=FALSE )
Welch Two Sample t-test data: x and y t = -2.4765, df = 15.083, p-value = 0.02559 alternative hypothesis: true difference in means is not equal to 0 95 percent confidence interval: -2.3787786 -0.1786919 sample estimates: mean of x mean of y 1.182663 2.461399
Testing for Equal Means for Non-Normal Samples
Wilcoxon Rank Sum Test
This is equivalent to the Mann-Whitney test. We set paired=FALSE
to signify that we are not using paired observations, as in a before-and-after study. Setting paired=TRUE
would run a
Wilcoxon Signed Rank Test instead. The Rank Sum Test technically tests for a difference in distribution, but we can use this to determine a difference in the mean of two distributions as well.
x <- rbeta( 30, 3, 6 ) y <- rbeta( 30, 6, 3 ) wilcox.test( x, y, paired=FALSE )
Kurskal-Wallis Test
Like the other multiple sample tests, we can use either a list or a formula to enter the data for the test.
data(ChickWeight) kurskal.test( weight ~ Diet, ChickWeight )
Kruskal-Wallis rank sum test data: weight by Diet Kruskal-Wallis chi-squared = 24.4499, df = 3, p-value = 2.012e-05