In this post I’ll look at different statistical hypothesis tests in R. Statistical tests can be tricky because they all have different assumptions that must be met before you can use them. Some tests require samples to be normally distributed, others require two samples to have the same variance, while others are not as restrictive.

We’ll begin with testing for normality. Then we’ll look at testing for equality of variance, with and without an assumption of normality. Finally we’ll look at testing for equality of mean, under different assumptions regarding normality and equal variance.

Testing for Normality

A number of tests assume normality. We can ascertain normality by visual inspection using a Q-Q plot, or we can use the Shapiro-Wilk test. Here is code and a figure for producing Q-Q plots.

x <- rnorm( 100, mean=0, sd=1 )
qqnorm( x )

Shapiro-Wilk Test

When using the Shapiro-Wilk test, it is important to recall that the null hypothesis the that the sample is normal. If you get a p-value below your predefined significance level $\alpha$ , then you may reject the null hypothesis that the sample is normally distributed.

shapiro.test( x )

This produces the following output,

    Shapiro-Wilk normality test

data:  x
W = 0.9822, p-value = 0.1964

Testing for Equal Variance for Normal Samples

Assuming that we know that our samples are normally distributed, we can perform either the F-test for two samples, or the Bartlett test for two or more samples. Note that the samples do not need to be the same size to perform these tests.

F-Test

The F-test looks at the ratio of the variances of the two samples. If the ratio is near one, then the two samples have the same variance, but if the ratio is significantly greater or lesser than one, then the two samples have unequal variance. The null hypothesis is that the variances are equal.

x <- rnorm( 100, mean=0, sd=1 )
y <- rnorm(  85, mean=1, sd=2 )
var.test( x, y )

    F test to compare two variances

data:  x and y
F = 0.2702, num df = 99, denom df = 84, p-value = 9.03e-10
alternative hypothesis: true ratio of variances is not equal to 1
95 percent confidence interval:
 0.1779818 0.4071762
sample estimates:
ratio of variances 
         0.2701598

Bartlett’s Test

Bartlett’s test takes two or more samples. The null hypothesis is that all of the samples have the same variance. Here, the samples y and z have the same variance, but that the test, which tests all of the samples, rejects the null hypothesis that all of the samples have equal variance.

x <- rnorm( 100, mean=0, sd=1 )
y <- rnorm(  85, mean=1, sd=2 )
z <- rnorm(  75, mean=2, sd=2 )
bartlett.test( list( x, y, z ) )

Here is the output of the code above. Note that since the p-value is very small, we may reject the null hypothesis that all of the samples have equal variance.

    Bartlett test of homogeneity of variances

data:  list(x, y, z)
Bartlett's K-squared = 35.0872, df = 2, p-value = 2.404e-08

Alternatively, if all of the samples live in the same data frame, with sample values in one column and class labels in another, then we can use the following syntax: bartlett.test( sample.values ~ class.labels, data.frame ). We’ll consider the InsectSprays data set that ships with R as an example.

data(InsectSprays)
head(InsectSprays)

  count spray
1    10     A
2     7     A
3    20     A
4    14     A
5    14     A
6    12     A

bartlett.test( count ~ spray, InsectSprays )

    Bartlett test of homogeneity of variances

data:  count by spray
Bartlett's K-squared = 25.9598, df = 5, p-value = 9.085e-05

Another way to do this would have been to look at the aggregate() function, which has a similar syntax. The aggregate() function aggregates a numerical value according to a categorical value, using some (aggregate) function, like mean() or var(). It’s like group-by in SQL or pandas.

aggregate( count ~ spray, InsectSprays, var )

  spray     count
1     A 22.272727
2     B 18.242424
3     C  3.901515
4     D  6.265152
5     E  3.000000
6     F 38.606061

Testing for Equal Variance for Non-Normal Samples

We have three non-parametric test for equal variance: the Ansari-Bradley and Mood tests for two samples, and the Fligner-Killeen test for multiple samples.

Ansari-Bradley Test

The null hypothesis is that the variances, or scale parameter are equal.

x <- rbeta( 100, 1, 3 )
y <- rbeta( 100, 1, 4 )
ansari.test( x, y )

    Ansari-Bradley test

data:  x and y
AB = 4808, p-value = 0.237
alternative hypothesis: true ratio of scales is not equal to 1

Mood Test

Again, the null hypothesis is that the variances, or scale parameter, are equal.

x <- rbeta( 100, 1, 3 )
y <- rbeta( 100, 1, 4 )
mood.test( x, y )

    Mood two-sample test of scale

data:  x and y
Z = 1.0599, p-value = 0.2892
alternative hypothesis: two.sided

Fligner-Killeen Test

Like the syntax of the Bartlett test, the Fligner-Killeen test takes either a list of vectors, or a formula, i.e., the data and the class labels delimited by a tilde. The null hypothesis is that all of the variances are equal.

x <- rbeta( 20, 1, 2 )
y <- rbeta( 30, 1, 3 )
fligner.test( list( x, y ) )

    Fligner-Killeen test of homogeneity of variances

data:  list(x, y)
Fligner-Killeen:med chi-squared = 1.9883, df = 1, p-value = 0.1585

data(ChickWeight)
fligner.test( weight ~ Diet, ChickWeight )

    Fligner-Killeen test of homogeneity of variances

data:  weight by Diet
Fligner-Killeen:med chi-squared = 30.5146, df = 3, p-value = 1.076e-06

Testing for Equal Means for Normal Samples

There are two flavors of the t-test: one for samples with equal variance, and another called Welch’s test for samples with unequal variance.

t-Test

Assuming we have normal samples, with equal variances, we can use the t-test.

x <- rnorm( 10, 1, 1 )
y <- rnorm( 10, 2, 1 )
t.test( x, y, var.equal=TRUE )

    Two Sample t-test

data:  x and y
t = -3.3745, df = 18, p-value = 0.003377
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
 -2.8067299 -0.6528291
sample estimates:
mean of x mean of y 
 0.753695  2.483474

Welch’s Two Sample t-Test

If the samples are normal, but the variances are not equal, then we can use Welch’s test.

x <- rnorm( 10, 1, 1 )
y <- rnorm( 10, 2, 2 )
t.test( x, y, var.equal=FALSE )

    Welch Two Sample t-test

data:  x and y
t = -2.4765, df = 15.083, p-value = 0.02559
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
 -2.3787786 -0.1786919
sample estimates:
mean of x mean of y 
 1.182663  2.461399

Testing for Equal Means for Non-Normal Samples

Wilcoxon Rank Sum Test

This is equivalent to the Mann-Whitney test. We set paired=FALSE to signify that we are not using paired observations, as in a before-and-after study. Setting paired=TRUE would run a
Wilcoxon Signed Rank Test instead. The Rank Sum Test technically tests for a difference in distribution, but we can use this to determine a difference in the mean of two distributions as well.

x <- rbeta( 30, 3, 6 )
y <- rbeta( 30, 6, 3 )
wilcox.test( x, y, paired=FALSE )

Kurskal-Wallis Test

Like the other multiple sample tests, we can use either a list or a formula to enter the data for the test.

data(ChickWeight)
kurskal.test( weight ~ Diet, ChickWeight )

    Kruskal-Wallis rank sum test

data:  weight by Diet
Kruskal-Wallis chi-squared = 24.4499, df = 3, p-value = 2.012e-05

Connor Johnson

Hypothesis Testing in R