Chapter 7. Comparing averages with two (or one) groups | Experimental design and data analysis

This chapter explores how to compare the average of a group to something else for simple experimental designs.

For instance, one of the most common questions that researchers ask is, “Do we have evidence that the average value of a trait in one group differs from the average for anther group?”.

Alternatively, a researcher might wish to test whether the mean value of a group differs from a specified value (e.g., Does the mean temperature of a group of subjects in a cold room differ from 37 degrees Celsius?). This chapter explores t-tests as a way to answer these questions.

Note that the videos in this chapter discuss results of t-tests in terms of statistical significance (we will update these videos, in this respect). In this light, we particularly draw your attention to the Practice Problems and Answers for this chapter because these materials demonstrate a more modern interpretation of t-tests by focusing on effect size.

The Practice Problems and Answers also follow the advice in the next Chapter, by abandoning the concept of statistical significance, and show how to test assumptions.

A note on non-parametic tests

This website does not consider non-parametric tests to compare measures of central tendency (e.g., median value or average value) between groups for two reasons. First, non-parametric tests involve (often unappreciated) assumptions that hinder analyses. For example, researchers often use a Mann-Whitney U test to analyse data that fail to meet the assumptions of a t-test; here, researchers aim to evaluate evidence that the median values differ between the groups.

This approach can be problematic, however, because Mann-Whitney U tests provide evidence for whether two distributions differ in general (i.e., including shape), not for whether median values differ between two distributions, specifically. Therefore, if two distributions being compared differ in shape, then a small p-value might arise from a Mann-Whitney U test (at least in part) due to shape differences, and not due differences in median values.

In other words, in order to use a Mann-Whitney U test to evaluate evidence for different median values between groups, the researcher must be confident that the two distributions have similar shapes. Such unappreciated assumptions make non-parametric tests less desirable. Second, non-parametric methods cannot provide meaningful estimates of effect size with appropriate uncertainty.

As analyses of effect size offer more insight than those that focus on p-values (see Chapter 8: Abandon statistical significance, non-parametric tests have less to offer than alternative methods, such as computational approaches (randomization / permutation tests, bootstrapping).

t-test: 2-sample

How can we test the hypothesis that the average values from two different groups are different? For simple experimental designs, researchers most commonly address this question with a 2-sample t-test, the subject of this video. We discuss when 2 samples t-tests are useful, how they work, and how to perform one in R

t-test- 2 sample (417.35 KB / PPTX)

t-test: Welchs (2-sample)

This video discusses Welch's 2-sample t-tests as an alternative to a 2-sample t-test. A Welch’s t-test is very similar to a 2-sample t-test, but does not require groups being compared to have similar variance. As a result, a Welch’s t-test is a bit easier to use than a 2-sample t-test. Perhaps this is why a Welch’s t-test is the default type of t-test in R (when comparing between groups). This video discusses when Welch’s t-tests are useful, how they work, and how to perform one in R.

t-test - welch (485.32 KB / PPTX)

t-test: 1 sample

Sometimes a researcher wishes to compare a mean value to a fixed value of interest. In this case, a 1 sample t-test may be appropriate. This video discusses when 1-sample t-tests are useful, how they work, and how to perform one in R.

t-test - 1 sample (338.24 KB / PPTX)

t-test: paired

For experimental designs with paired data, a paired t-test offers a more powerful approach to analysing data than 2-sample t-tests or the Welch’s t-test. This video discusses paired t-tests: what are paired data? When are paired t-tests useful? How do they work? How do I perform a paired t-test in R?

t-test - paired (883.83 KB / PPTX)

Bootstrappingto compare differences between two groups

Bootstrapping provides an useful method to analyse data when data do not meet assumptions of other approaches. We do not explain bootstrapping in detail here, but will do so in other chapters in the future.

For now, we highlight a bootstrapping method by Johnston & Faulkner (2021) to compare median values between two groups when the data are not normally distributed and / or variance differs between the groups. We do not explain the method here, but encourage you to read their paper for details. We do, however demonstrate an analysis using their method, at the link, below.

Article in New Phytologist re: a bootstrap approach (909.34 KB / PDF)

Please note that bootstrapping does involve some assumptions that must be met. In this light, we are surprised that Johnston & Faulkner (2021) claim that their method works when variances differ between the groups because previous bootstrapping methods have encountered problems with unequal variance (see Manly (1995) and Hayes (2000) for discussion). 

Article in JSTOR re: randomisation tests (1.74 MB / PDF)

Article in Animal behaviour re: randomisation tests (80.9 KB / PDF)

Therefore, we highlight Johnston & Faulkner (2021)’s method from an optimistic perspective. But in the future, we plan to figure out why their method is robust to unequal variance (or alternatively, whether Johnston & Faulkner (2021)’s assertion that their method works with unequal variance is incorrect). We will feel reassured after sorting this out, and will update this website accordingly.

Johnston & Faulkner (2021) method. (244.39 KB / )

Practice problems and answers

Experimental data chapter 7 T-test problems (12.54 KB / DOCX)

Experimental data chapter 7 MS1 data (31.93 KB / CSV)

Experimental data chapter 7 Answers (257.74 KB / PDF)