Okay, in this video, I want to talk about an alternative approach to connecting power analyses and to introduce this argument or this way of thinking. I want to start by pointing out a possible discrepancy you might have picked up on. If you'd wash enough of these videos. I had one series of videos where I followed advice from the American Statistical Association where I criticize the practice of comparing p-values against some arbitrary threshold. So usually 0.05 when making conclusions from an experiment. So I criticize the whole notion of statistical significance. So in that series of videos, I advocate instead that we should interpret P-values along a continuum. So we could consider P-values that are very large to constitute weak evidence for an effect. Whereas p-values it or a small to consider to constitute strong evidence for an effect. Okay, so that's one set of arguments we have on the one hand. On the other hand, you'll have noticed in this series of videos on power analyses that we, that the standard approach for a power analysis is to adopt this criterion of comparing the p-value against some arbitrary a threshold of 0.05. Okay? Those two perspectives seem to be at odds with one another. So what I'd like to do first in this video is I'd like to present a perspective that reconciles those two different observations. And then the whole point of this video is to provide a whole other way of conducting power analyses that circumvents this apparent discrepancy. Okay. Let's start though first with this alternate perspective that allows us to reconcile these two perspectives. Okay. So how do I resolve this apparent discrepancy? Well, let's return to the arguments that I made in that series of videos that criticizes the notion of statistical significance. In that series of videos, as I've already mentioned, I advocate. I'm an interpretation of V of P-values along a sliding scale. So I argue that we should consider P-values that are very large to be considered weak evidence for an effect. Whereas p-values that are small to be considered strong evidence for an effect. So specifically I points to p-values that are around 0.005. So we're talking about p-values that have an extra 0 in here. And I say the p-values at around 0.005 or smaller can be considered substantial evidence for an effect or strong evidence for an effect. I also argue that p-values of around 0.05 might be considered moderate evidence for an effect in that light. Okay, if we adopt that perspective of thinking, then what we can argue is that if we've designed an experiment, have AT, to have 80 percent power using this criterion of comparing the p-value again 0.05. Then we would say that that experiment would have a high probability of detecting evidence for an effect of a specified size where that evidence is at least moderate. Okay. I'm going to say it again because I don't feel I said that in a very articulate manner. What I want to argue is that if we adopt this perspective of considering the p-value of around 0.05 to constitute a moderate evidence for an effect in the p-values of around 0.05 or smaller consider are considered strong evidence for an effect than if we design an experiment to detect an effect size of a given size. And we use this criterion of keeping less than 0.05 if we design an experiment to have high power. So say 80 percent power than what we're doing is we're designing our experiment is to have a high probability of detecting an effect of a given size with at least moderate evidence for that effect. Okay, that's the perspective that I adopt when interpreting power analyses using this criterion of comparing a p-value against 0.05. I interpret that power analysis in terms of our ability to detect an effect with at least moderate levels of evidence for that effect. Okay. So. That's just a perspective that I wanted to bring to your attention to try to reconcile what might look like a discrepancy between these two different ways of using p-values. Okay? In this video, I want to talk about a whole other perspective that we can use to actually circumvent this apparent discrepancy entirely. And the perspective that I'm gonna talk about in this video is that we can use power analyses in a way where instead of conducting a power analysis using this criterion to have a successful experiment, instead, we can conduct a power analysis in such a way that we would design their experience to be able to estimate an effect size to a certain level of precision. Where our criterion for that level of precision would be to use the standard error of our effect size as our criterion for success in our power analysis. Now I recognize that everything that I said there is probably going to sound like gobbledygook or B little bit vague. Okay, so our next goal in this video is going to be to explain what I mean. They're in more detail to explain it more clearly. Before I do that though, I wants to bring this book to your attention. Another book by Nick Colgate and Graham Ruxton. It came out in 2020, I believe it's called power analysis and introduction for the life sciences. And it's a beautiful book. In this book, they talk about exactly the methods that we're going to be talking about in this video. They also specifically talk about how we can use simulations in order to conduct power analysis. Power analysis. And they talk about statistical power. Generally. It's highly accessible. It's written to be accessible for undergraduate students, but it'll also be highly useful for say, professors are principal investigators. I've used this book and my other videos on power analyses, but I simply forgot to acknowledge the book. In those other videos, I want to make sure that I brought this book to everyone's attention in this video because it's an extremely useful book and I want you to be aware of it. Okay? Now, back to the question at hand. How can we conduct a power analysis so that we can design an experiment with a certain ability or a certain level of power to estimate an effect size with certain level of precision. That, so we're going to talk about next. Okay. Well, let's imagine that we wanted to be able to conduct a power analysis for an experiment where the effect size that we're interested in was equal to 1. Okay, That's what I'm trying to communicate with this very simple figure here where we just have one dot, okay? And that one dot represents an effect size of one case we have effect size along the y-axis here. And this dot here represents an effect size for experiment that's equal to one. Let's make this a little bit more concrete. Let's imagine that we had an experiment where we wanted to compare some measurement under two different treatments where we had either a drug treatment and a placebo treatment. And we considered the difference between the average of our drug treatment and the average of our placebo treatment. That difference between those means is our effect size. Okay? And what I'm going to argue in this video is that we can conduct a power analysis in such a way where we try to estimate that effect size to a certain level of precision. Okay? And we could use that criterion as the focus of our power analysis. So to be specific here, if we're imagining that a reasonable effect size for our experiment would be around one. So expects there to be a difference of one between our two different groups. Then what I'm advocating is that we can conduct a power analysis where we try to estimate an effect size like this with a certain level of precision. In other words, when we conduct our experiment, our goal is to estimator effect size. Such a way that we would have error bars around this effect size that would fall within a certain range. How do we think about that? How do we think about coming up with a specific range of values that we would like to fall around our effect size. Ok, well, to do that, I want to remind you of our discussions of 95% confidence intervals. We had a series of videos in which we talked about 95% confidence intervals. And we talked about how we consider 95% confidence intervals to be. But we can view 95% confidence intervals very roughly as providing a plausible range of values. Okay? Before I go on, I wants to highlight that when we do that, when we adopt that perspective, we are involving a certain assumption. And that assumption comes in from the facts that we're adopting this arbitrary level of certainty. So we've chosen a level of certainty of 95%, okay? As opposed to say, calculate a 99 percent confidence intervals, okay? In other words, when we use 95% confidence intervals, we're saying that we want to specify a plausible range for a values for something given this 95% criterion. Okay? If that's not clear to you in this video, then what I'd like to do is point you back to those other videos. You've already create a 95 percent confidence intervals to clarify that concept. Okay? The point of, the point I'm trying to make here in the, in the context of this video is to say that we can use 95% confidence intervals as to imagine a plausible range of values around our effect size. Okay? So we can, well, that's what I wanted to say. Okay, the second thing I'd like to point out is this relationship between 95% confidence intervals and the standard error for an estimate of an effect size. Okay? So in our previous videos on 95% confidence intervals, we talked about how very roughly we can say that a 95 percent confidence interval is they roughly equal to twice the size of a standard error. And this is particularly true when we have a relatively large standard, when we have a relatively large sample size. Pardon me? Okay. So what I'm trying to build up to is I've started out by saying that we can conduct a power analysis where the goal of our power analysis is to estimate effect size to a certain level of precision. What I'm trying to argue here is that we can use 95% confidence intervals around our estimate of our, of our effect size to provide a plausible range of values for our effect size. In other words, we can use 95% confidence intervals as our criterion for the precision to which you want to estimate our effect size. We can go a step further by saying that in order to construct our 95 percent confidence intervals, we can construct them by calculating the standard error of our effect size. Okay? When we do that, when we adopt this approach, what we can do is we can conduct a power analysis to ask the question, what sample size do I need to have in order to estimate an effect like 0.1 with a certain level of precision based on the standard error of our effect size, say 80 percent of the time. Okay? That's the perspective that we're going to adopt. Now, let's get a little bit more concrete here. Let's imagine in our experiment where we had our drug and our placebo treatment that we wanted to estimate. The size, the difference between our two groups, where we'd have confidence intervals that would be plus or minus, say 0.7. Okay, so in other words, if we imagine that our effect size would be around one, or if we were imagining that the effect sizes were interested in being able to detect was around one. Okay. Then we could formulate a power analysis where we wanted to be able to estimate this effect size plus a 95 percent confidence interval of equal to 0.7 and minus 95 percent confidence interval around 0.7. Okay? So again, if we want to estimate this effect size with plus or minus 95% confidence intervals of a value of 0.7. Then we can do that by focusing on standard errors of this effect size that we're plus or minus about 0.35. And that's because roughly speaking, the standard error of something is roughly equal to about half of the 95 percent confidence intervals for our estimate. Okay? Let's make this even more concrete. Okay? Let's implement this in awe. Okay. So we're going to revisit the same code that we used to simulate a power analysis for a t-test came. So I'm just going to remind you of those simulations. Just to get our feet wet and to get ourselves grounded for this alternative approach. So in a previous video, I talked about how we could use simulations to conduct a power analysis for a t-test. And in that context I said, let's imagine that our mean value for one group was equal to five and our mean value for another group was six. Notice that in that context, we have an effect size is equal to one, which is the same situation that we were describing just in those previous slides. Okay? We also said that let's imagine that the variation within each of our groups. So within our first group, second group can be described by standard deviation of 0.5. We also said that we're interested in determining the power of an experiments that had this size but effect. And this amount of variation where we had a sample size of five in each of our groups. Okay? And we conducted our simulations 10 thousand times a. So he had and same as being equal to 10000. Then what we did is we conducted a series of simulations where for each simulation, we generated five data points for each of our groups. Based on the mean value, the respected for each of our groups, and based on the standard deviation that we expected there to be within each of our groups. And we saved those data points in one object called Group 1 and another object called Group 2. And those represented our five imaginary data points for each of our different treatment groups. We then conducted a t-test to compare the data from group one to those from group 2. And we see the output from that t-test in this object called T out. And then we looked at the output in that were saved in this object key out. And we asked whether or not our p-value is less than 0.05 if it was and we set our experiment was a success. In other words, we said we were able to detect this effect and we increase the counter by one. Okay, so basic. In other words, we go through a process where we count the number of times that our experiment was a success. Where are our criterion for that success was obtaining a p-value of less than 0.05. Okay? And then we repeated this process 10 thousand times. And then we asked what fraction of our simulations gave us a successful outcome. And that was our estimate of power. So we can just run this code. And you can see that using this approach, we found that we had about 7879% power to detect an effect of this size. Using this criterion of a p-value of less than 0.05. What I'd like to do now is show you how we can modify this script. So that. Instead of deciding on whether or not an experiment was a success based on comparing a p-value of 0, a p-value against this value of 0.05. Instead. We can consider an experiment to be a success if our standard error for our effect size is smaller than some given value. Back in our slides, we said that we would like to be able to estimate an effect size of say, one to a certain level of precision. Where we said we wanted our, the standard error for our effect size to be equal to 0.35 or smaller. We can include that criterion in this, in this script very simply by saying, instead of pulling out the p-value from our output of our t-test, we can now pull out the standard error for the effect size. Okay, so what the t-test is doing, it is calculating, it's estimating the effect size. It's estimating the size, the difference between our two different groups. And it's also estimating the standard error of that effect size, which you can then use to calculate the confidence interval for that effect size. We can pull out the standard error that it has estimated for our effect size. Just by doing this. And then we can compare that standard error against this value of 0.35. This is the size, the standard error that we would like. We would like to be able to conduct our experiment so that we can estimate our effect size, width so that we generate an error bars around or effect size. So the standard errors are at least as small as 0.35. That's it we've specified here. Okay? And that's all that we have to do. To change the nature of our power analysis. We can now run this code again. So I'm just going to highlight all of it and run it. And here's our output. In this case, we see that our power is equal to around 72% or 71.5%. What that means is that if we have, if our goal is to detect an effect size of say, one. And we want to be able to detect that effect size to a certain level of precision. Then if we have a sample size of five and given this level of variation, so basically, given this overall experimental design, this result here tells us that with this experimental design, we will be able to estimate an effect size with a standard error that is equal to 0.35 or smaller, 72% of the time. Okay? That's what this output now means. Let's imagine we want to be less picky. Let's imagine that we want it to be able to estimate our standard error for our effect size, say with a larger set of standard errors. In other words, we're being less picky about how well we can estimate our effect size. Let's change this criteria instead of being having our standard error for effect size being smaller than 0.35. Now let's say it's smaller than 0.5. So we're now allowing our error bars around our effect size to be even larger. Okay, so we're being less picky about how well we estimate that effect size. If we're being less picky than we would expect that we'd be able to estimated effect size to this level of precision more often than we'd expect that this output would give us a larger number. Let's check that. Okay. I just run this code again. Now we can see that if we set, if we're less picky about our, how precisely we estimate our effect size. Now we can see that we're able to estimated effect size with the standard error smaller than 0.5 rather than zero-point 35, a larger fraction of the time. In fact, based on this criterion. We can say that an experiment with this design, so a sample size of five, where we have a standard deviation of 0.5 and each of our groups. We can say that in this situation, we should be able to detect or estimate an effect size with standard errors that are smaller than 0.05, about 99% of the time. That's what that means. Okay, Let's change this back to our original value of 0.35. Just run this again. And I'd like to show you one last thing which is extremely cool. Okay. First of all, let's just run this a few times, okay, so I just want you to see that the variability that we get in our power analysis. So if we run this again, we see that we estimate our power to be around 72.4%. And we get 72.19%. Run it again, we get 72 percent, 72.2, okay, 7.46. I'm the point here is you can see that because they're running simulations, we're not getting exact probabilities, but we're getting probabilities of around 72%. Based on this criterion. What happens if we change our effect size? Okay? Right now we're designing our experiments to detect an effect size that's equal to one. In other words, we have the difference between this group and this group is equal to one unit. What if we make the effect size bigger? So now we're changing it from 60, some six to 60. So now our effect size is not equal to one, MET is equal to 55. What happens now? We can run this again. You can see that our power has not really changed. Okay, what about if we make this a much bigger effect size? Okay, let's make the effect size. I don't know how many zeros added there. Now our effect sizes on the order of millions. Okay? Now what happens? No, power remains pretty similar. This is pretty cool and this is extremely useful. What I'm highlighting here, what I want you to see is that for some forms of data and for some forms of statistical tests, like the case of a t-test. In some cases, the standard error that we calculate for an effect size is independent of the effect size itself. Okay? In other words, when we're calculating our effect size in our, in our t-test and we're calculating the standard error for that effect size and the t-test. The way in which this is calculated, does not depend on the effect size itself. That's extremely cool. Why is that cool? Well, that's cool. Because think back to our video where we talked about how difficult it can be to decide upon a particular effect size that interests you. Okay, That's a really difficult task. It's an essential task and an important task, but it's difficult. What I want to point out here is that for certain types of data and for certain types of statistical analyses, like with t-tests. If we adopt, whoops, if we adopt this perspective for conducting a power analysis. So if we adopt the perspective, we're, our goal is to be able to estimate an effect size to a certain level of precision. Than if we adopt that perspective, than for certain types of data. The effect size itself does not matter. And that's it we've illustrated here. Okay? When we adopt this perspective of trying to estimate an effect size to a certain level of precision. And by doing, and we do that by using the standard error for our estimate, for our estimate of the effect size. And comparing that to some given value that we're interested in. When we adopts that perspective, we get the same value or the same level of power irrespective of the actual effect size. In other words, when we adopt this perspective of designing experiment in such a way where we try to estimate an effect size to a certain level of precision. From that perspective, we don't need to worry about the actual effect size itself for certain types of analyses and for certain types of data. That's huge. That's incredibly useful because that means we no longer, in those circumstances we known, no longer need to worry about exactly what our effect size would be. So let's wrap this up. Okay. Why don't you going back to the slides. Okay, let's wrap this up. What I've tried to do in this video is try to point out that we can use criteria other than a comparison of a p-value, again, 0.05 in order to conduct our power analyses, in order to conduct, in order to determine the power of an experiment. And I've illustrated that in this video by showing how we can use the standard error of an effect size as our criterion in order to decide how we would design our experiment or to decide on an effect size for an experiment. Okay? I've also pointed out that for some types of analyses and for some types of data, the standard error of our effect size does not depend on the effect size itself. If you're unsure whether or not a particular experiments where a particular analysis that you're using would fall into that category. Then you can easily check that for yourself by using the approach it I just use here. You could run a power analysis for one effect size, changed determinant what the power is for that, for that set up. And then just change the effect size to some ludicrously different value. And if your power does not change, then that suggests that you're that for the approach, you're using, the effect size and the standard error of that effect size will be independent. Okay? If if you're a situation falls ill or if your power analysis falls into that situation. Than what this can mean is that if our goal when we're designing experiment is to be able to estimate an effect size to a certain level of precision. Then we greatly free ourselves when we're conducting our power analysis because we can circumvent all of the difficult decisions surrounding how big an effect size would need to be. So i've, I've said that terribly. I'm trying to say here is that if our goal is to design experiments so that we can detect an effect size to a certain level of precision. Then in contexts where the effect size is independent of the standard error of that, of its, of the effect size. Then those contexts, we do not need to think about what the exact effect size would likely, would, would likely be. In our experiment, we can steps, sidestep that entire issue, which is incredibly useful. I'm going to stop this video there. I'm going to say, I hope this video has been helpful. And I'll say, thank you very much.