In this post I’ll look at different statistical hypothesis tests in R. Statistical tests can be tricky because they all have different assumptions that must be met before you can use them. Some tests require samples to be normally distributed, others require two samples to have the same variance, while others are not as restrictive.

We’ll begin with testing for normality. Then we’ll look at testing for equality of variance, with and without an assumption of normality. Finally we’ll look at testing for equality of mean, under different assumptions regarding normality and equal variance.

# Testing for Normality

A number of tests assume normality. We can ascertain normality by visual inspection using a Q-Q plot, or we can use the Shapiro-Wilk test. Here is code and a figure for producing Q-Q plots.

x <- rnorm( 100, mean=0, sd=1 ) qqnorm( x )

## Shapiro-Wilk Test

When using the Shapiro-Wilk test, it is important to recall that the null hypothesis the that the sample is normal. If you get a p-value below your predefined significance level , then you may reject the null hypothesis that the sample is normally distributed.

shapiro.test( x )

This produces the following output,

Shapiro-Wilk normality test data: x W = 0.9822, p-value = 0.1964

# Testing for Equal Variance for Normal Samples

Assuming that we know that our samples are normally distributed, we can perform either the F-test for two samples, or the Bartlett test for two or more samples. Note that the samples do not need to be the same size to perform these tests.

## F-Test

The F-test looks at the ratio of the variances of the two samples. If the ratio is near one, then the two samples have the same variance, but if the ratio is significantly greater or lesser than one, then the two samples have unequal variance. The null hypothesis is that the variances are equal.

x <- rnorm( 100, mean=0, sd=1 ) y <- rnorm( 85, mean=1, sd=2 ) var.test( x, y )

F test to compare two variances data: x and y F = 0.2702, num df = 99, denom df = 84, p-value = 9.03e-10 alternative hypothesis: true ratio of variances is not equal to 1 95 percent confidence interval: 0.1779818 0.4071762 sample estimates: ratio of variances 0.2701598

## Bartlett’s Test

Bartlett’s test takes two or more samples. The null hypothesis is that all of the samples have the same variance. Here, the samples `y`

and `z`

have the same variance, but that the test, which tests all of the samples, rejects the null hypothesis that all of the samples have equal variance.

x <- rnorm( 100, mean=0, sd=1 ) y <- rnorm( 85, mean=1, sd=2 ) z <- rnorm( 75, mean=2, sd=2 ) bartlett.test( list( x, y, z ) )

Here is the output of the code above. Note that since the p-value is very small, we may reject the null hypothesis that all of the samples have equal variance.

Bartlett test of homogeneity of variances data: list(x, y, z) Bartlett's K-squared = 35.0872, df = 2, p-value = 2.404e-08

Alternatively, if all of the samples live in the same data frame, with sample values in one column and class labels in another, then we can use the following syntax: * bartlett.test( sample.values ~ class.labels, data.frame )*. We’ll consider the

`InsectSprays`

data set that ships with R as an example.data(InsectSprays) head(InsectSprays)

count spray 1 10 A 2 7 A 3 20 A 4 14 A 5 14 A 6 12 A

bartlett.test( count ~ spray, InsectSprays )

Bartlett test of homogeneity of variances data: count by spray Bartlett's K-squared = 25.9598, df = 5, p-value = 9.085e-05

Another way to do this would have been to look at the `aggregate()`

function, which has a similar syntax. The `aggregate()`

function aggregates a numerical value according to a categorical value, using some (aggregate) function, like `mean()`

or `var()`

. It’s like group-by in SQL or pandas.

aggregate( count ~ spray, InsectSprays, var )

spray count 1 A 22.272727 2 B 18.242424 3 C 3.901515 4 D 6.265152 5 E 3.000000 6 F 38.606061

# Testing for Equal Variance for Non-Normal Samples

We have three non-parametric test for equal variance: the Ansari-Bradley and Mood tests for two samples, and the Fligner-Killeen test for multiple samples.

## Ansari-Bradley Test

The null hypothesis is that the variances, or scale parameter are equal.

x <- rbeta( 100, 1, 3 ) y <- rbeta( 100, 1, 4 ) ansari.test( x, y )

Ansari-Bradley test data: x and y AB = 4808, p-value = 0.237 alternative hypothesis: true ratio of scales is not equal to 1

## Mood Test

Again, the null hypothesis is that the variances, or scale parameter, are equal.

x <- rbeta( 100, 1, 3 ) y <- rbeta( 100, 1, 4 ) mood.test( x, y )

Mood two-sample test of scale data: x and y Z = 1.0599, p-value = 0.2892 alternative hypothesis: two.sided

## Fligner-Killeen Test

Like the syntax of the Bartlett test, the Fligner-Killeen test takes either a list of vectors, or a formula, i.e., the data and the class labels delimited by a tilde. The null hypothesis is that all of the variances are equal.

x <- rbeta( 20, 1, 2 ) y <- rbeta( 30, 1, 3 ) fligner.test( list( x, y ) )

Fligner-Killeen test of homogeneity of variances data: list(x, y) Fligner-Killeen:med chi-squared = 1.9883, df = 1, p-value = 0.1585

data(ChickWeight) fligner.test( weight ~ Diet, ChickWeight )

Fligner-Killeen test of homogeneity of variances data: weight by Diet Fligner-Killeen:med chi-squared = 30.5146, df = 3, p-value = 1.076e-06

# Testing for Equal Means for Normal Samples

There are two flavors of the t-test: one for samples with equal variance, and another called Welch’s test for samples with unequal variance.

## t-Test

Assuming we have normal samples, with equal variances, we can use the t-test.

x <- rnorm( 10, 1, 1 ) y <- rnorm( 10, 2, 1 ) t.test( x, y, var.equal=TRUE )

Two Sample t-test data: x and y t = -3.3745, df = 18, p-value = 0.003377 alternative hypothesis: true difference in means is not equal to 0 95 percent confidence interval: -2.8067299 -0.6528291 sample estimates: mean of x mean of y 0.753695 2.483474

## Welch’s Two Sample t-Test

If the samples are normal, but the variances are not equal, then we can use Welch’s test.

x <- rnorm( 10, 1, 1 ) y <- rnorm( 10, 2, 2 ) t.test( x, y, var.equal=FALSE )

Welch Two Sample t-test data: x and y t = -2.4765, df = 15.083, p-value = 0.02559 alternative hypothesis: true difference in means is not equal to 0 95 percent confidence interval: -2.3787786 -0.1786919 sample estimates: mean of x mean of y 1.182663 2.461399

# Testing for Equal Means for Non-Normal Samples

## Wilcoxon Rank Sum Test

This is equivalent to the Mann-Whitney test. We set `paired=FALSE`

to signify that we are not using paired observations, as in a before-and-after study. Setting `paired=TRUE`

would run a

Wilcoxon Signed Rank Test instead. The Rank Sum Test technically tests for a difference in distribution, but we can use this to determine a difference in the mean of two distributions as well.

x <- rbeta( 30, 3, 6 ) y <- rbeta( 30, 6, 3 ) wilcox.test( x, y, paired=FALSE )

## Kurskal-Wallis Test

Like the other multiple sample tests, we can use either a list or a formula to enter the data for the test.

data(ChickWeight) kurskal.test( weight ~ Diet, ChickWeight )

Kruskal-Wallis rank sum test data: weight by Diet Kruskal-Wallis chi-squared = 24.4499, df = 3, p-value = 2.012e-05