Okay, in this video, we are going to address the question that's right here. Let's imagine you conducted an experiment and you obtained a particular p-value from that experiment. Now let's imagine that you repeated that experiment exactly and analyze the data for that new experiment. Do you expect your new p-value to be similar to the original p-value? That's a good question. And it's a question that's tackled really nicely in this beautiful paper. The fickle p-value generates eerie, eerie, reproducible results. What the Getty that in this paper is, they aim to address the expectation that if one experiment produces one p-value, there's often an expectation that a repeated or a repeat of that original experiment should yield a similar p-value. And what they show, and what I'm going to show you is that expectation is usually not going to be met. Okay? We're going to do this. I'm picking up right where we left off in our previous video where I showed you how to use simulations in order to conduct a power analysis for a t-test. Okay, so here's our hour, here is our code. We'd specified two different means and a standard deviation. We specified a sample size of around five. We set a counter in order to count the number of times that we found a significant result. And we ran our simulations 10000 thousand times. So we generated some data for one group of 41 group, January data for another group, ran a t-test on those two groups, saved that output in a funk, in an, in an object which we call t out. And then we said, if the p-value for our test is less than 0.05, then add one to our counter case. We consider that another instance for which we detected a difference between our groups. So if we just run this again, I'm going to get a very similar outcome to what we found before. So what we found here is that 78% of the time, or for 78% of our simulations, we did detect a difference between our groups according to this criterion of a p-value of less than 0.05. Another way of saying this is that according to the simulations, an experiment like this would have about 78% power in order to detect a difference of one unit between our two groups using a t-test when the standard deviation is equal to 0.5. Okay? I just want to review what we did there because what we're gonna do in order to investigate this question about How similar we expect our p-values to be among experiments. In order to answer that question, we're just going to modify some of the code that we have here. So I wanted to go over the code to reorient you to this, to the code you already have. Okay? So what we're gonna do is we are going to simply save every single P value that we obtain throughout the simulations. So to do that, I'm going to create a vector which I'll call P Val Sims. I'm going to make an empty vector. So I'm just going to say no. Okay, thank you very much. And now we're not going to evaluate whether the p-value is less than 0.05 because we wants to keep every single p-value that we generate. Okay? Because what we're essentially going to be doing in our simulations here is we're going to be running an experiment where the true difference between the means is 56. Okay, that's a true difference. Sorry. What? We're running an experiment where the true difference between the groups that we're comparing is equal to one unit, because one group has a mean of five, another group has a mean of six. Okay? And in our experiment, the standard deviation is 0.05, et cetera. Okay? And so what we're gonna do is we're going to run it and experiment with these parameters 10 thousand times. And we're going to save the p-value for every single time you run that experiment. Okay? So say P Val Sims is P Val Sims. And okay, so what we're doing here is we're taking our original vector, the contain the p-values. And we're just going to add onto that vector the p-value that we obtain from the output of our t-test. And when we concatenate those things together, we're effectively just updating our vector of P-values, P-values Sims. And so we are saving this as a new version of itself, which we have here, P Val Sims. So what this line of code is basically doing is just updating our vector P Val Sims by adding on the new p-value that we've obtained from our t-test. Okay? So with this in mind, what we're gonna do is we're just going to plot a histogram of these data. So we're gonna say hist of P Val Sims. And I'll show you this later. Okay, so let's just highlight all this and we will actually one more thing. We're going to consider this question for a variety of levels of statistical power. So I'm going to start by considering the case where we have lower power than we had before. Originally our power was around 80% Since had decrease the sample size from five to two. We're now going to be looking at our distribution of p-values when we have low power. Okay? So with this sample size, our power is about 20 percent. Okay? So if you recall, this is generally what we might expect in a number of areas of biology. We've talked about in previous videos saying, Well, this may be what we expect to be the typical power for a number of studies in neuroscience and for a number of studies in ecology and evolution. At least if we're looking for small effect sizes, then this might be appropriate. Okay? So if we conduct an experiment with this amount of power, let's look at the distribution of p-values that we get. These values here. This is a histogram of our p-values. And you can see that they run a full range from being relatively small all the way up to one. Okay, let's make this figure just a little bit more refined because we have our blocks a very big. So I'm just going to have a specify more blocks. So we're just going to say a thousand blocks. This should smooth that out a bit. Okay? So there we go. What we can see now is that the bulk of our distribution of our p-values lies probably about there. Okay? But what we should see immediately is that when we run an experiment with low power like this, you can see it's entirely feasible for one experiment to give a p-value save around 0.1 or even much lower than that. And another experiment to get a p-value of say, 0.5. Okay? And it's not totally unreasonable to expect a p-value all the way up near one. Okay? The bulk of the p-values ally in this range, say between being very small and up to around 0.4. So the point here is that when we have a low powered experiment, we expect very little consistency among the p-values that we would obtain among identical experiments. So if we tried to replicate, perform again, a low-powered experiment, we have very little reason to believe that the p-value from one experiment should be similar to another, to a follow-up experiment, okay? If that follow-up experiment does, is basically repeating the first experiment. Okay? So when we have low power, would you not expect consistency among p-values when we're repeating an experiment? What about if we have around 50 percent, I'm sorry, 80 percent power, which is what we had earlier. We had about 7879% power when our sample size was five. What did we get in this case? Okay? Here we can see something much better. So with relatively high power, our p-values now are much more centered down here. But you can still see that there's a long tail here going up to there. Okay, or at least that's how it appears to me. Let's focus in here a little bit and try to get a better look at what's going on here. So to do that, let's just say we want our x axis to range from 0 to 0.1, let's say. Okay, so we're going to just look at this figure from to the left of this point here. Okay? In this case, you can see that on this scale, once again, with relatively high power, we can still see a fair degree of variation in our p-values, at least to the extent that among our significant p-values. Remember if we have about, in this case we have about 7879% power. That means that 78 or 79% of our p-values among all of our experiments are going to lie to the left of this point. Okay, So by far the bulk of our p-values lie to the left of 0.05. That's what's implied by this value here. So saying that assessor, basically 79% of our p-values are smaller than 0.05. But within that range, you can see it's really quite reasonable to expect either a very small p-value in one experiment and a p-value that's much closer to 0.05 in another experiment. From another perspective though, the facts that we have a power of only around 80 percent. What that means is that if we repeat an experiment, then only 80% of the time do you expect to get, sorry, Say that again. Let's imagine we conducted experiments that has 80 percent power and we get a p-value that's less than 0.05. If we repeat that experiment again, we only expect to get a p-value of less than 0.0580% of the time in that repeated experiment. 20 percent of the time we would expect to repeated experiments to give us a p-value of greater than 0.05. Okay, so that's another way of thinking about the amount of variation and we would expect to have among our p-values. Let's just make this even more optimistic. Now. Let's consider a higher Powered situation. In this case, we have close to 95% power. This says, indicates it with a sample size of eight, we have about 96% power. Okay? And in this case, I think what we're going to see is now finally, with really high power, so power of more than around 95%. Now we're seeing the vast bulk of our p-values being pretty small or pretty much all in the same aging because now the vast majority by p-values are less than 0.01. Okay? So we're now starting to get to a situation where the vast majority of our p-values are starting to be more impressive. Let's wrap this up. The point of this video is that the extent to which we expect there to be similarity among p-values, even if we repeat an experiment exactly will depend on the power of that experiment. If we have low-powered studies than we would expect the p-values to vary wildly among repeated experiments. But as a power increases, then our p-values will tend to become more and more similar among experiments. In order for our p-values to become really relatively consistent, however, we need to have very high power, say greater than 90 percent. Okay? This is a value that is quoted in the paper here. Okay, this is where I got that say greater than 90 percent idea. Okay? So I think that this is a really good exercise in order to get our perspect, in order to get a better perspective on the randomness of p-values or the extent to which p-values can vary among experiments simply due to sampling error. Okay? And I think one of the really big take home messages from this exercise is to point out that the stochastic nature or the random nature of p-values and how they are so susceptible to sampling error when you have relatively low power. This really warns us that when we're making conclusions about our experiments, it's really important to consider evidence beyond a p-value when making conclusions. If we put all of our weight on the p-value and make, when making a conclusion about an experiment. Then, especially if we're using low-powered experiments, we could be putting ourselves in a heap of trouble. Because as you demonstrated here, those p-values can be hilus to highly stochastic. So we want additional forms of information in order to make more robust conclusions. Okay? We're going to stop the video there and say, I hope this video has been helpful and thank you very much.