Okay, In this video we're going to talk about another aspect of experimental design called blocking. Before we do that, I have a confession to make. I often draw at least a little bit from this book, the experimental does, or experimental design for the life sciences. Bigram, Ruxton and Nicole gave. I often draw from it at least a little bit because it's an excellent book. And again, I highly recommend you either read it in the library or buy a copy for yourself. In this particular instance, for this particular video, I must confess that I have actually followed the discussion in the chapter on blocking to a fairly tight degree, to a degree that, uh, I I'm starting to feel a bit guilty. So that's my confession. But I take that as a as a nod of appreciation to Graham and Nick for doing such a good job on creating this book. Before we get into discussing blocking properly, I wants to walk up to that topic. Starting by revisiting this imaginary experiments that we've discussed a couple of times in previous videos were in this previous or sorry, or in this imaginary experiment, we want to know whether or not petting puppies would change her average rate of tail wagging. And we imagined that these are what some of our subjects might look like. And so we said that a really good way to go about designing this experiment was to randomly allocate our subjects here to our two treatments, the Nope heading and the padding treatment. And we did this. We are, we emphasize that it's so important to randomly assign your subjects to your treatments. Because what this does is this ensures that the treatment effect itself is the only systematic difference between our two groups. In other words, we can see there's lots of variation among our, among our subjects. They come in different breeds, might be different sizes, different ages, et cetera. And when we randomly sample our subjects to our different groups, than what we essentially do is we average over all of that kind of variation. So that we say on average, all of those qualities of our subjects will be similar between our two different groups. And as a result, the only the only systematic difference between these two groups will be the padding treatment itself. Okay, that's why. We said randomization is so important. Let's imagine that our randomization procedure turned out like this. And we'll imagine that we had multiple individuals from our various breeds. Okay, so here we have tube hugs in or no padding treatment. We have two laboratories in each of our treatments, et cetera. Okay. What I'd like you to notice is that we have a lot of variation are among our subjects. So our subjects might vary in, in their age, which I've tried to illustrate here by having these pictures of various sizes. The different sized different size pictures can also stand for the different sizes of the dogs themselves. We have different breeds. The subjects might differ in their experience with humans. That might different their hunger levels, et cetera. Okay. So what I'd like to point out is, even if there's no systematic difference between these two groups other than the treatment effects themselves. We can still see it. There's a lot of potential variation among our subjects, within our treatment groups. Like the age, size, breed, experience, et cetera. Now, if that variation and if any of those traits that differ among our subjects can also influence the thing we're measuring. So in this case, the rate of tail wagging, then that variation in these traits among the subjects will also lead to there being variation in the thing we're measuring. Okay? So our Y variable or the rate of tail wagging, said be more specific. Imagine that Labrador is wags their tails more than pugs. I have no idea whether or not that's true, but let's imagine they did. Then the facts that we have multiple breeds within each of our treatments. That fact is going to lead to variation in the rate of tail wagging within each of our treatments. Why is that important? That's important because the more variation that we have within our treatments, the more difficulty we will have for detecting differences between our treatments. Or another way of saying that is to say that when we have more variation within our treatments, it becomes more difficult to estimate the size of the differences between our treatments. Okay, and the reason for why that is is something we're going to go into in a lot of depth in other videos, and particularly when we talk about power. Okay? But for the sake of this video, I just want you to accept this as as well. I just want you to accept that as a fact for the moment. Okay, for the sake of this video, that when we have more vary, variation within our treatments, that can make it more difficult to detect and estimate differences between our treatments. Okay. Because of this, we might want to design our experiments in different ways depending on whether or not we had some a priori knowledge about which traits are likely to influence our dependent variable or the thing that we're measuring. So what do I mean by a priori knowledge? What I mean is if we knew ahead of time and we were designing our experiment, that some of the traits that are likely to vary among our subjects, like different breeds. If we knew that some of those traits were likely to cause differences in the thing we're measuring. So the rate of tail wagging, then that a print, a priori knowledge, might lead us to use a particular experimental design, which we're going to talk about more in this video. So that's this, this yes scenario. Okay, So if yes, if we did have some, a prior wry sense, so before going into experiments about what traits of our subjects are likely to influence our Y variable, then we might opt for a particular experimental design, which we're going to talk about more if we don't. So if we don't have any a priori sense of the types of characteristics that are likely to influence our Y variable. Then probably the best approach when does, when designing our experiment is to just use the simple randomization procedure that we've illustrated so far in this video. And then we talked about so extensively in our first randomization video, where we would simply randomly allocate our subjects two or three different treatments. Okay? So if we don't have any a priori knowledge about the kinds of traits that influence our Y variable, then go simple. Okay, it's reliable, it's robust, and it will work. Okay? If we do have some a priori knowledge, then we might wants to adopt some alternative design. So for example, let's imagine that we knew that different breeds wag their tails to different extents. Then we would want to design our experiments in such a way where we could reduce the influence of breed on tail wagging. Okay? Then we could do that in a number of ways. One simple way is to simply focus our experiment on one particular breed. By doing this, by focusing on one breed, we may greatly decrease the amount of variation that we have within our treatments for the rate of tail wagging. As a result, that decreased variation within our treatments will make it easier to detect and estimate the size of the difference in tail wagging between our treatments. Okay? There's a problem with this though, which is that when we only use one breed of dog, then this greatly reduces the generalizability of our conclusions. Because you've only looked at one breed. We won't know whether or not the results from our experiments will apply to other breeds. Now, given that we're talking about a potentially silly experiment, this might seem a bit esoteric. But this kind of situation arises all the time in biology. So for example, in biomedical sciences, researchers will often focus on a particular genotype of mouse. And it might do that for a, for a number of reasons. But one of those reasons is often to reduce the amount of genetic variation in their study. And as a result, to reduce the amount of variation that we have within our treatments. Okay? So those decisions are often made for good reasons. So those decisions to focus on just one genotype. But there are costs. Were those costs are reducing the generalizability of our conclusions. So are there alternatives? Yes, of course there otherwise you wouldn't be going on this video. Okay? So what I'd like to do for the rest this video is to propose alternative experimental designs to focusing just on one particular subset or a subtype of subject. And what it can do instead is we can design our experiments using something called blocking. Where blocking basically involves splitting our subjects up into specific groups where the individuals within groups or within blocks have some characteristic in common. Okay? So I've tried to illustrate that here with this cartoon where I've imagined that breeds differ in their rate of tail wagging. And as a result, what I'd like to do is I would like to block our subjects by breed. Okay, so we have Labrador is in this block. We have this type of dog, which I sadly don't know. And our block number 2 and we have pugs and our third block. Okay? And for reasons we'll talk about later in this video, this experimental design will, could potentially give us a greater ability to detect differences between our padding treatments. Okay, if you haven't picked up on this already, the word block is really just statistician speaker, statistician lingo for a group. Okay, That's all it is. We can equally call this group one, group two, group three. But when we use the term block, then that's a commonly used language when talking about experimental design. That immediately lets the realist or no, whether we're talking about a particular approach, which is just involves block or grouping. Okay? So how do we create these groups or these blocks? There is a method to the madness of creating blocks are creating groups. To illustrate this, let's imagine that instead of blocking a grouping by breed. Instead we're going to block by size. Okay? So what I've done here is I've arranged our subjects in a way that follows this first step for creating blocks. The first step is to rank our subjects by our blocking variable. And so if we're going to block our subjects by size, then what I've done is I've just ordered all of our subjects according to size. And I've made my best guess here. Sometimes it's a little bit difficult to say whether or not this is larger than that based on the different shapes of these figures. But I've done my best to say that I think this is the smallest subject, and this is the biggest subject and we're going from smallest to largest in this way. Okay, so that's a first step. We want to organize our subjects by the variable that we want to block by. The second thing that we do is we take our ordered subjects or our ranked subjects, and we split them into groups, which we call blocks. But we want to do this in a particular way. We want to split them in such a way. That's the number of subjects within each of our blocks is a multiple of the number of treatments that we have. So when our petty to experiment, we only have two treatments. We have padding and no padding. Okay? And so that's two groups. And so we want the number of subjects that occur within each of our blocks be a multiple of two. And we've done that because we have four individuals in this block, for individuals in this block. And I haven't put a square around these, this last group, but this 1, 2, 3, 4, this, these are the individuals that go into our last block of large puppies, okay? And since four is a multiple of two, we have fulfilled this second criterion. Okay? The reason for having the number of individuals within blocks be a multiple of r number of treatments is that this allows us to make sure that we can have an equal number of, an equal number of subjects in each of our treatments. Okay, and that's why we're doing it. That's why we're assigning are a number of individuals per block in this way. Now in our last step, what we're going to do is we're going to take the individuals within each of our blocks, and we're just going to randomly assign them to each of our treatments. Okay? So we're going to take these four individuals here. And I have randomly assigned them. These are the same for individuals in our first block, so you block one, and I've randomly assigned those individuals into the Nope heading and petting treatment. Okay. So that's the last step. And that's how you would go about. Gradient, an experiment that has a combination of blocks and treatments. We call this design a randomized block design for hopefully obvious reasons. Now, the footwork before, pardon me. Before we go any further, we should ask, how does blocking actually help us? What let's imagine we had experiments that looks like this, where we had two different treatments, treatment 1 and treatment 2. And this is just a generic experiment where we want to compare some we've measured for treatment one to what we find in treatment to UNC, he's about to compare the average value of these data to the average value of these data. And what I'd like you to notice is that within each of these groups, we have a fair amount of variation. If you were just sit back and eyeball these data, it might look like, yes, there's a tendency for there to be a difference between the mean of this group and amine of that group. But the fact that we have so much variation within each of these groups makes it less obvious that we have a difference between our two different treatments. All of that can change when we include blocking it and our experiment. Now what I'm gonna do on the next slide is I'm just going to color code each of these data points according to a blocking variable. Okay, the data in this slide and what we're going to see in the next slide are actually identical. When I used R to create these plots are so r will also shift around the position of these points horizontally. But I assure you that these data are identical between these two treatments. All I've done is change the coloring. Okay? This coloring here represents the different subjects that lie within three different blocks are three different groups. So these individuals in green, they all fall within block. Within one block. With blue is another block and red is another block. Now what we can do when we have blocked our data in this way is that we can include our block as a variable in our analysis. And when we do that, our analysis will control for the variation that occurs among our blocks when making comparisons between our treatments. In essence, what that means is, is that when we perform our analysis, we're going to be making comparisons between our treatments within each of our blocks. So we're going to be comparing this green block to that green block and be comparing the mean values of these. And be comparing the mean of this blue block to the mean of this blue block and the mean of this red block to the mean of this red block. This can be a more powerful approach because you'll now see we have less variation. Within each of our blocks. And so that can make the comparison between our two different treatments a more powerful comparison, especially when we use all of these data together. So we use all the information from all of our various blocks together. We can increase our power to detect the difference between our two different treatments. Okay? So that's what we get out a blocking. When we include a block in our analysis. We essentially create a situation where we can detect our differences between our two treatments more effectively. And we can measure the effect of our treatments themselves. More precisely. What I'd like to do now is shift gears and talk about other ways in which we can incorporate blocking into an experiment. So far I've just focused on blocking based on the characteristics of subjects, but we can block by other aspects of an experiment as well. So let's imagine we're performing experiments in the greenhouse like this. We've seen this slide and previous videos as well. And you can see that we're performing this experiment on different benches. And it's very reasonable to expect that the conditions that the plants experience will be different among these benches. And indeed, if we look at data that come from an experiments just like this one, we can look at the effect of bench on the plant characteristics. And you can see that here are some data from one bench. Here's some data for another benches and data for another bench. And these data represent the number of branches that are produced by the plants growing on these benches. And you can see that there's a pretty substantial differences in the number of branches that are produced by the plants that experienced this bench compared to the other benches, et cetera. What this tells us is that there is likely substantial variation in the growing environments among these benches. And as a result, if we were performing an experiment in this greenhouse, what we might want to do is you might wants to block by bench. Okay, there's lots of other variables that we could block by as well. So still thinking in terms of variation in space, if a use in an incubator, you might want to block by the shelf that your samples are on within an incubator. Or if you're working with animals which are living in cages and kept in a cage rack, then you might want to block by the shelf within the cage rack. Because cages on a higher shelf which are closer to light, might experience different conditions from the cages that are lower in the rack and farther away from light, for example. Okay. We might want to block by time. So if our experimental conditions vary over the day or vary among days, then we might want to block by day or by time of the day. If we have multiple researchers in an experiment, those researchers might not be making measurements. And exactly the same ways who might have among rates, among researcher variation in how they make their measurements. If that's the case, then using more than one research will increase the variation in our data. And we might want to block by researcher. Same thing if we're using different forms of equipment or for working with different litters, or if we're housing individuals in different cages. These are all aspects of an experiment that we might want to block by in our experimental design. Finally, I'll just point kind of return to a perspective that we started with his started with in this video. And point out that we can block by any feature of subjects that we can measure. So for example, you might want to block by body size and a highlight body size because we know from vast amounts of literature that body size can influence many aspects of biology. Now, this previous slide, this one here, might give us the impression that we might wants to create experiments we have where we blocked by everything. Okay. So I want to ask now, is that something that would actually be wise, should we block by lots and lots of different variables? And the answer is no. And there's at least two reasons for this. The first is that if we use blocking in an experiment and include blocking in our analysis, than blocking will actually reduce an experiments power if blocks do not explain any variation. In other words, if we think ahead of time that a blocking variable like age or bench in a greenhouse. If we think that those aspects of our experiment are going to be contributing variation in the thing we've measured. And it turns out they don't. That including block in your experiment can actually make your experiment less powerful. Okay, So obviously, that's something we want to avoid. We want to include blocking as a means to increase the sensitivity of our experiments, not decrease it. So as a result, we should really only try and use blocking if we have some a priori knowledge about what's types of traits are likely to influence our data. So how do we get that a priori knowledge? Well, I would say as a first step, I would simply say just Try it when you run a particular experiment. Very often when we're conducting experiments, where often conducted experiments that have similarities among our various experiments. If we conduct one experiment in a greenhouse, we're likely to use that greenhouse again. And so if in one experiment we test for differences among our benches, and that will give us some knowledge about whether or not we should be blocking by bench in future experiments, for example, the same thing can be true for looking among litres. For example, if we're using multiple litters in an experiment or anything else that we might be interested in blocking Bye. Okay, So it's can be really useful to go out and get that a prior AI knowledge for future. Or we should go out and get that kind of data that can inform our decisions for designing future experiments. Okay? There's also another situation where we should probably not consider blocking. And that is in cases where we have a small number of subjects. So let's imagine we had an experiment that looks like this, where we only had, say, two subjects in each of our block treatment combinations. We all know that when we perform experiments, accidents happen. And it's possible that we might not be able to use data from all of our subjects. Let's imagine, for instance, that we lost data from these two subjects. Okay? If we lost data for all the subjects within a particular block treatment combination as we have here. Then what this leads us to is something called an incomplete design, where we do not have data for all of our different combinations of treatment and block and y at the point they want to raise here is that in that type of situation can become much more challenging to analyze your data. And so as a result, we want to avoid situations where we could end up with incomplete designs. And as a result, it's best to use blocking it when we have experiments where we can have reasonable sample sizes per block treatment combinations so that if we do end up losing some of our subjects, we will not be left in a situation where we have no data for one of our block treatment combinations. Okay? So it's best to use blocking when we have reasonable sample size is per block. The last thing I'd like to point out is that for certain types of blocking designs, there's actually hidden assumption. And the kind of experimental design that I want to highlight is the one that I've illustrated here. Where I'm imagining that we have three different treatments, doesn't have to be three treatments. It could be two or four treatments. And we have a series of blocks. But the essential feature here is that we only have a single subject for each of our block treatment combinations. For this type of experimental design. We have to assume that there is no interaction between our treatment and our blocks. What exactly is an interaction? That's something we're going to talk about in much more detail. When we talk about general linear models specifically, we talked about general linear models where we're dealing with more than one factor or where we're combining multiple fact Clinton, Well, sorry, uh, where we're combining a factor and something called a covariant. So my point here is we're going to talk about exactly what interactions are in a lot more detail in videos where we deal with generalized linear models. So at the moment, what I'm going to do is I'm just gonna give you a very quick description of what we mean by an interaction. Interactions occur in an experiment like this, where the effect of a treatment which depend on the particular block that we're looking at. So for example, we might find that in this block, we might find a relatively large difference between treatment 1 and these other treatments. Whereas in another block, we might find that there's really very little difference between treatment 1 and these other treatments. If that effect is real, then what that would imply is the effect of our treatments would depend on the particular environment that we are examining those treatments in. Okay, and that kind of phenomenon arises all the time in biology. It's one of the most interesting aspects of biology. But with this type of experimental design, we cannot actually test for whether or not there is an interaction between our treatment and our blocks. Now that is, we cannot test whether or not the size, the difference between our various treatments depends on the particular block that we're considering. And that's because in order to test for this kind of interaction, we need to have multiple subjects within each of our treatment block combinations. And I'll show you an example of that in a moment. Okay? So the main point of this slide is to say that if you design an experiment like this, then you are forced to make an assumption that the effects of the treatments will not depend on the particular blocks you're looking at. If you're not comfortable with that kind of assumption, and there are good reasons to not become comfortable with that kind of assumption. You'd probably want you design your experiment, at least like this. We'd want to have at least two subjects in each of you, a treatment block combinations. Two subjects per treatment block combination is the minimum. We would do much better if we had. Or there are good reasons often for having more than two subjects per block treatment combination, such as a seat situation we're just discussing on this slide. Okay? But the point here is that if you have multiple subjects per Shimon block combination, that you can do is you can include an interaction between block and treatment in your analysis, which is something we'll discuss more and we're talking about general linear models. And we can specifically look for those types of interactions, okay? Um, and if we find them, then that could be an interesting outcome of the experiment. So I'm going to summarize that we've been talking about in this video. The main two points this video are. First of all, the blocking can be a really effective way of increase in experiments ability to detect an effect, and to be able to precisely estimate the effect of something in an experiment. But we'd think about when blocking it would be an appropriate way to go when we're designing an experiment. Specifically, blocking is best used when we can guide our experimental design by a priori knowledge. I'll end the video there and say, I hope this video has been helpful and thank you very much.