Chapter 11. Power analysis

This chapter explains what power analysis is and why it is essential to design a study.

In short, low-powered studies cannot be trusted.  This begs the question, how often are studies insufficiently powered?  As we discuss, at least within the biological sciences, it appears that most published studies are underpowered.  We also discuss the fallacy of the argument, “I know my study was underpowered, but my results reveal a large effect size so my results must be important!”.

Power analysis typically aims to design studies with a high probability of obtaining a p < 0.05 when an effect of a given size exists.  Those who have read / watched the Chapter, “Abandon statistical significance” might wonder whether this approach (p < 0.05) remains appropriate.  The answer is, yes, albeit with a different interpretation:  power analysis with a goal to achieve p < 0.05 is equivalent to designing a study to have a high probability of detecting “moderate” or “suggestive” evidence for an effect.  Researchers can use a smaller cutoff p-value (e.g., p < 0.005) to design studies that can provide “strong of substantial evidence” for an effect. 

Power analysis is usually conducted using available software (e.g., G*Power, or the ‘pwr’ library in R).  These approaches are extremely useful and we demonstrate some power analyses using G*Power in this and following chapters.  However, we also follow Colgrave & Ruxton’s (2021; see recommended reading) example of using simulations to conduct power analysis.  While simulations require a bit of extra effort, they also allow you to do things that cannot be done by standard software. 

For example, when using standard software, power analysis for a 1-Factor General Linear Model (and more complex models, too) is based on the p-value for an overall effect of a factor, not on the analyses of post-hoc tests (e.g., Tukey tests). 

Therefore, a power analysis based on standard software can easily lead a researcher to design a study that has high power to detect a factor’s overall effect, but lower power when conducting post-hoc tests.  This may be undesirable, and can be remedied using simulations.  Alternatively, simulations allow a researcher to focus on a goal other than obtaining p < 0.05 when designing their study.  For example, one can design a study with high power to estimate an effect size with a given level of precision. 

This approach has several advantages, discussed in this Chapter.  Finally, a researcher can use simulations to conduct power analysis for any type of analysis, while standard software works for a limited set of analyses.  For example, the Chapter, ‘Mixed effects models’ provides resources to conduct power analysis (via simulations) for such models, while software like G*Power cannot perform power analysis for mixed effects models.

By the time you complete this chapter, we expect you to appreciate the following argument:  an experiment will always yield useful results if it has high power to detect the smallest effect size that is of biological interest.  As will become clear, such experiments have a high probability of detecting an effect if it exists (and in this case, should be easy to publish). 

On the other hand, it is also very interesting when such an experiment fails to detect an effect because the researcher can argue that, if an effect exists, it is likely too small to be of biological interest.  Hence, studies with high power to detect the smallest effect size of biological interest always yield interesting (and likely publishable) results, so long as the hypothesis is interesting and the experiment is well designed, generally.