Okay, In this video, we're going to talk about power analysis. Specifically, we're going to ask what statistical power is. And we're also going to ask how calc, and we're also going to briefly discuss how calculating statistical power can be useful in research. Before going on though, I need to give credit where it's due and thank Nicole grave who developed the vast majority of these slides and very graciously offered them to me and I loved them. So thank you very much Nick. Anyone who's designing an experiment needs to ask themselves how many replicates they're going to need. And the answer is going to be that e, they need enough. But also not too many. Because bigger experiments are not always better. An experiment that's too big can be a waste of time. Money. Most importantly, it can be a waste of animals lives. And large experiments can also be less well done because it can be more difficult to manage. Anyone who's, it's very important for anyone who performs experiments with animals to be able to say that they limited their sample sizes in order to reduce animal suffering. There is a flip side to this, however, where if sample sizes are reduced too much, then I'd experiments reliability can become compromised. And if an experiment is no longer reliable than the experiment may be entirely unuseful. In which case, all the animals that were used in that experiment were just used for no reason at all and their lives were waste. So when we're deciding between these two, how do we decide between these two extremes of having too few verses, too many samples? There's some bad ways to decide. For example, to just continue doing exactly as you've always done in the past. Or to just copy what everyone else does. Or to let your budget decide how many samples you're going to use in your experiment. How much better way to decide is to make an informed choice using power analysis. So, what is statistical power? Statistical power is the probability of detecting an effect in your experiment, assuming that there is an effect there to detect in the first place. And statistical power for the experiment depends on three things. I'd experiments, statistical power depends, first of all, on the size the effect the experiment aims to detect. On average, are all else being equal. Experiments that aim to detect larger effect sizes. It'll be more statistically powerful because it's, it's easier to detect larger effects. Statistical power also depends on the amount of variation that's inherent in your data. Because the more variable your data are, the more difficult it will be to detect an effect. And so more variable data sets will tend to lead to lower statistical power. And finally, statistical power will depend on sample size, where increase sample size will increase statistical power. I want you to notice something in language that I'm using where I've been talking about the probability of detecting an effect. That language inherently implies a focus on p-values and using p-values to make conclusions about your data. If instead, you wish to analyze your data and, and interpret your data in terms of effect sizes and confidence intervals on your estimated effect sizes. Then I'd like to point out that the logic that we're using here in this video applies equally to designing experiments where you want to interpret them in terms of effect sizes and error and with confidence intervals on those effect sizes. To explain power analysis, I think the best thing to do is to simply walk through an imaginary experiment, where our imaginary experiment is going to involve adding steroids to water. And to test whether or not those steroids increase the size of the fish. And we're going to design an imaginary experiment 4. We start, we're going to design an imagined experiment to answer this imaginary question. And we want to know the power of our imaginary experiment ends. To determine that power, we need to determine the effect size Richard has sit in detecting the amount of variation in our data and the sample size we're considering for our imaginary experiment. So our imaginary experiment is going to have imaginary background biology, where we're going to imagine that we're working with a species of fish that has a mean weight of five grams. And just like real fish out of our imaginary fish are not all going to comment exactly the same size, but our imaginary fish are going to have variation in their size. Specifically, we're going to imagine that weight, that the weight of our fish species is normally distributed with a standard deviation of 0.5. And finally, we're going to imagine that the treatments they see, the addition of steroids is only biologically interesting. They're only biologically relevant to us if the steroids addition increases weight by at least one gram. So here's our view of our imaginary world, where we have two groups of fish. We have a control fish, which have a mean, a mean weight of five grams. Their weight is normally distributed. And that normal distribution, which has a mean of five grams, also has a standard deviation of half a gram. Are treated fish. So the fish that experienced the steroid also are normally, have a normally distributed weight. And that normal distribution has a standard deviation of 0.5 grams, just like the control fish. This is an assumption that we're making. We're assuming that adage steroids will not change the variation in the weight of the fish. This is a standard assumption. It doesn't necessarily mean that it's true, but it's an assumption that often is true. And it's an assumption that's made when performing many statistical analyses, including just standard two-sample t-tests or analysis of variance. We're also going to imagine, however, that are treated fish have a different mean. So the average, mean weight of our treated fish is going to be one gram larger than the average weight of our control fish. And what we're going to ask is that if this view of the world, will we be able to detect that this effect of our steroid for a given sample sizes of each of our treatments. And we're specifically to answer that question by asking whether or not we would be able to detect this difference. If our imaginary experiment had five fish in the control treatment and five fish and the treated treatment. To answer this question, we're going to simulate this world where we're going to generate five control fish by drawing five random numbers from a normal distribution with a mean of five and a standard deviation of 0.5. I'll just go back a slide for a moment and point out that in this idea of having a normal distribution with a mean of five is standard deviation 0.5 exactly matches our description of our control fish, where we said a control fish have a weight, this normally distributed with a mean of five grams. And the normal distribution has a standard deviation of 0.5 grams. We're also going to generate five experimental fish or fish that experienced this, the steroid by drawing five random numbers from a normal distribution again. But this time with the normal distribution has a mean of six and a standard deviation of 0.5. We're then going to take those five data points to be got from our control fish. And the five data points from experimental fish. Put them together into some statistical package, run a statistical analysis and ask whether or not we can detect the difference between these two different groups. Before we do that though, I just want to address this idea of simulating the world a little more. Mostly because I can imagine that some people might find, finds a little bit strange to be simulating the biological world. And they, some people might be asking themselves what simulated data have to do with the real world. So to address that concern, it wants you to consider the example of human height, where it's well known that human height is well approximated by a normal distribution. So consider this photo here, which was, it's a photo of students and 941 where the students were lined up according to their height. Where we had one individual who was four feet, 10 inches tall, one individual who was five-foot one, let's say seven individuals who were five foot four, add our most common height was five foot eight, where we had 27 individuals. And you can see that even with this relatively small, even with this relatively small sample of individuals, you can see that their heights or the height of the individuals in this population is well approximated by a normal distribution. So now I want you to imagine for a moment there we're not dealing with fish, that we're dealing with humans. We wants to conduct a very simple experiment where we wants to have, we want to estimate the average height of humans in this population. And for that very simple experiment, what we might do is we might go into this population, randomly sample five individuals from this normally distributed population of humans. Take those five humans and measure their heights, and that gives us five data points of human heights. Now I want you to, now I want you to compare that situation to one where we take. Normal distribution that has the same mean as the human population and the same standard deviation as this population. And I want you to imagine we take that normal distribution and we randomly sampled five numbers from that normal distribution. Those five numbers that were drawn from that, that were randomly drawn from that normal distribution will have the same statistical properties as the five humans or sake, I have the same statistical properties as the heights are those five humans that we randomly sampled from this population. So I hope it's clear with this example that we can use simulations in order to generate data that very closely match subpopulations that we hope to be learning about. So back to our fish. Let's imagine we conducted this simulation once, where we got five control fish and five experimental fish. And the five control fish had a mean weight of 4.82 grams, and the experimental fish had a mean weight of 6.01. We took those data, put them into stats program, and we analyze them. And here's our test statistic. F has 18 degrees of freedom, and F is equal to 13.14, which for this number of degrees of freedom corresponds to a p-value of 0.006. So for this first simulation, we were able to, we were able to detect a significant difference between control fish and our experimental fish. Before we go on, I want to point out that if our F value had, had a value of 5.32, then for this number of degrees of freedom of an F value of 5.32 would give us a p-value of 0.05. So F being equal to 5.32 represents the cutoff between what's formally statistically significant versus statistically nonsignificant. And you'll see a figure coming up in a few slides, but that's relevant. Let's imagine we repeated this simulation another time. And again we got a new mean for our control fish, in a new mean for our experimental fish. But in this case, we did, we did detect a significant difference between these groups, but just barely, this p-value is very close to 0.05. So imagine we repeated this again. Whoops. In this case, we did not detect a significant difference between these groups. Even know, we know that there truly is a difference between the population, the control fish, and the experimental fish, because we drew with the random numbers for our control fish and for experimental fish from different normal distributions that had different averages. Let's imagine we continue the simulation ten times. And we plotted the results here. Where along the x-axis we have our values for our F ratio or our test statistic. And this blue line here represents our critical F value being five-point 32. So that any f values that are greater than 5.32 corresponds to simulations where we found a significant difference between our two different groups of fish. Whereas f values that are less than five points be to court, correspond to cases where our simulations did not reveal a significant difference between our groups of fish. And the y-axis here just counts the number of cases that we have for our various F ratios. If we were to continue simulations, say to them a 100 times you make it is distribution that looks like this. If it continued them to do 1000 simulations, then we might get a distribution looks like this. A 1000 simulations is a reasonable number for product for conducting a power analysis. And what did we find after running these 1000 simulations? Well, we found that there were 798 cases. Our F value is greater than 5.32. In other words, out of the 1000 simulations that we conducted, 798 of them revealed a significant difference between our two different groups of fish, between our control fish and our experimental fish. What this means is that our experiment had 79% power, or actually 79.8% power. In order to detect. An effect of one gram sort of to detect a change in mass from five grams to six grams. When our populations had a standard deviation of 0.5 in their masses. And when our experiment had five fish and each of our treatments. So under those circumstances and experiments that conforms exactly to those, to those circumstances will have a 79.8% power to detect a significant difference between our two different groups of fish. Just want to point out that we'd use simulations in order to explain what power analysis is. Typically for an experiment as simple as the one we're imagining, simulations would not be necessary because statisticians have already done the mathematics to be able to determine the relationships between effect size, variation, sample size, and power. So that if you were to conduct a power analysis in a standard statistical package, then that statistical packages are unlikely to be using simulations that we'll be using these mathematics that statisticians developed instead. So let's just recap. What do statistical power? Statistical power refers to the probability of detecting an effect of a given size, assuming that effect size exists in the first place. I want to point out though, that it's meaningless to simply say that my experiment has 80 percent power and just leave it there. And that's because it's meaningless to discuss at experiments power without specifying the effect size. And that's because the power of an experiment is specific to that effect size. So an experiments that has high power to detect large effects will vary, like could very well have low power to detect more subtle effects. So whenever we discuss on experiments power, we have to discuss that power specifically in the context of the effect size that the experiments designed to detect. When is power analysis useful? Well, as we've just walked through, you can use power analysis to decide how many samples you need. For a focal experimental design you might be considering. Power analysis though can also be useful to weigh the pros and cons of different experimental designs. So for example, let's imagine we were considering these two experimental designs for a particular experiment, where in both experiments, we would be using 48 mice. If one of these experiments had more power than the other, then we'd be inclined to use that experiment. Alternatively, we might try doing something like actually reducing the number of mice that are present in one of our experimental designs. And if we found that one of our experiments had equivalent power to another experiment, but use fewer mice than ethics would drive us to choosing the experiments that used fewer animals. In conclusion, I want to point out that when we're designing experiments and when we're performing a power analysis. The questions of this design are often biological rather than statistical. And that should be most obvious when we're trying to think of an appropriate effect size to be, to be thinking about appropriate effect size for our experiment. Another major point to draw from this video is that a power analysis might indicate to us that the perfect experiment that might be imagining might simply not be possible. But power analysis can really be useful by helping us to understand the limits of the studies that we might actually be able to perform.