Name: Towards AI Legal Name: Towards AI, Inc. Description: Towards AI is the world's leading artificial intelligence (AI) and technology publication. Read by thought-leaders and decision-makers around the world. Phone Number: +1-650-246-9381 Email: [email protected]
228 Park Avenue South New York, NY 10003 United States
Website: Publisher: https://towardsai.net/#publisher Diversity Policy: https://towardsai.net/about Ethics Policy: https://towardsai.net/about Masthead: https://towardsai.net/about
Name: Towards AI Legal Name: Towards AI, Inc. Description: Towards AI is the world's leading artificial intelligence (AI) and technology publication. Founders: Roberto Iriondo, , Job Title: Co-founder and Advisor Works for: Towards AI, Inc. Follow Roberto: X, LinkedIn, GitHub, Google Scholar, Towards AI Profile, Medium, ML@CMU, FreeCodeCamp, Crunchbase, Bloomberg, Roberto Iriondo, Generative AI Lab, Generative AI Lab Denis Piffaretti, Job Title: Co-founder Works for: Towards AI, Inc. Louie Peters, Job Title: Co-founder Works for: Towards AI, Inc. Louis-François Bouchard, Job Title: Co-founder Works for: Towards AI, Inc. Cover:
Towards AI Cover
Logo:
Towards AI Logo
Areas Served: Worldwide Alternate Name: Towards AI, Inc. Alternate Name: Towards AI Co. Alternate Name: towards ai Alternate Name: towardsai Alternate Name: towards.ai Alternate Name: tai Alternate Name: toward ai Alternate Name: toward.ai Alternate Name: Towards AI, Inc. Alternate Name: towardsai.net Alternate Name: pub.towardsai.net
5 stars – based on 497 reviews

Frequently Used, Contextual References

TODO: Remember to copy unique IDs whenever it needs used. i.e., URL: 304b2e42315e

Resources

Take the GenAI Test: 25 Questions, 6 Topics. Free from Activeloop & Towards AI

Publication

Building and Optimizing Randomized Complete Block Designs using SAS
Latest

Building and Optimizing Randomized Complete Block Designs using SAS

Last Updated on February 27, 2022 by Editorial Team

Author(s): Dr. Marc Jacobs

Originally published on Towards AI the World’s Leading AI and Technology News and Media Company. If you are building an AI-related product or service, we invite you to consider becoming an AI sponsor. At Towards AI, we help scale AI and technology startups. Let us help you unleash your technology to the masses.

Statistics

In the animal sciences, the design that is used the most is the Randomized Complete Block Design (RCBD). This design can be compared to the stratified-randomized design often used in the life sciences. Its ingenuity lies in using the design of an experiment to include variance in order to exclude variance. That is correct. We include variance in our design to make sure we can control it, so when the time comes to analyze the data we can exclude it. I have written about randomization and blocking in a previous introductory post. In this post, I will dive deeper into theΒ RCBD.

The RCBD is a combination of building blocks and randomization. There really is nothing more toΒ it.

Here, you can see pigs being randomized across treatments. Since all the pigs are the same, randomization is forΒ show.

Now, let's say that our pigs are not all of the same colors. If we would just randomize the pigs, we would have an uneven distribution of color within our treatments. The color of the pig could be a covariate we need to deal with in our analysis. Although this is really fine, it is much more efficient and effective to just deal with a potential confounder in the designΒ stage.

Randomization does not automatically mean that known covariates are evenly distributed across treatments. Especially in small samples. And there is only so much analyses can correctΒ for.

A complete randomized design is a design of large numbers. It assumes that, by randomizing from a population, the samples representing each treatment will be equal on all characteristics, except the treatment β†’ comparison will be pure. In animal science, this will rarely be the case because we cannot implement the numbers needed to reach this level of purity. However, we do have two alternatives:

  1. Include color as a covariate in theΒ model
  2. Include color in the design β†’ Randomized Complete BlockΒ Design

We will now show how an RCBD design works in practice. First, we create blocks, and we create just as many blocks as there are colors. Often, such a one-to-one relationship is not possible and you need to create blocks that have very little within-block variance and a lot of between-block variances.

Six blocks for sixΒ colors.
Now, we have 6 blocks of 4 animals each. Each block consists of a homogenous group of animals (color). Each block consists of a homogenous group of animalsΒ (color)

A Randomized Complete Block Design actually means ONE replicate per treatment*block combination. This will lead to the exampleΒ below.

Each treatment has the same distribution of color in the treatment, meaning that each treatment has a pig of each color. THIS is aΒ RCBD.

As a result of the blocking, each treatment now has a similar distribution with regards to the blocking factor β†’ COLOR. Here, block represented the color of the pigs, and by blocking we made sure that each treatment has at least one pig of each color. However, we rarely have such nice predefined levels of a blocking factor. In addition, to get a good estimate of the variance attributable by the blocking factor, we need more than two levels of each block, which can be difficult to accomplish

In summary, a Randomized Complete Block Design ensuresΒ that:

  1. randomized treatments are randomized within aΒ block
  2. all treatments are in oneΒ block
  3. there is one replicate per block*treatment
  4. each block is a mini-experiment since we compare treatments within eachΒ block.

In general, an RCBD is more efficient than a Completely Randomized Design (CRD) since units within blocks are more similar (homogenous) than units between blocks (heterogeneous). Hence, after accounting for the block variation, the experimental error becomesΒ smaller

The blocks capture variance by purposely putting it there. By using a RCBD you can control variance so it does not surprise you. In fact, you are using known variance to your benefit, making your design stronger. The more variance you can capture in your blocks, theΒ better.

The graph above shows that the variance in each treatment has been partitioned into blocks. Each block is a mini-experiment with its own treatment difference and specific variance. This design increases the signal-to-noise ratio as the treatment differences across the blocks areΒ similar

The biggest question that comes to mind is WHEN blocking is actually helpful. As you might have guessed, its helpfulness is dependent on its ability to capture variance. The more variance, theΒ better.

The more variance the block can capture, the better the block / total variance ratioΒ becomes.

Blocking is all about explaining the variation YOU created. If your blocks don’t capture variation it is useless to design a blocked study and can even be harmful to your power. To decide if blocking is helpful you need to ask yourself:

  1. What is my main outcome of interest?
  2. Are there sources of variation that will decrease my signal-to-noise ratio
  3. How large will this decreaseΒ be?
  4. Will a randomized block design help me to limit the decrease?
  5. Will a randomized block design help me to actually increase the signal-to-noise ratio?

In an optimal Randomized Complete Block Design, youΒ have:

  1. Homogeneity within theΒ block
  2. Heterogeneity across theΒ blocks

If the blocks are too large, or there are too few blocks, the blocks will be too variable and thus unable to efficiently catch and deleteΒ variance

The blocks used to the left show too much within-block variance compared to the block structure to the right. This is extremely important. Blocking that does not contain more between than within-variance is useless at best and harmfulΒ at

The more variance a blocking structure can capture, the fewer blocks you need to capture the β€˜true’ variance within a single study. As with sample size, it is better to have more blocks than fewer. In the end, it depends on the ability of the blocks to create a between-block variance.

Now, what if you design an RCBD but analyze it as a CRD? By doing so, you fail to capture the variance attributable by block. This variance will go in the garbage canβ€Šβ€”β€Šexperimental errorβ€Šβ€”β€Šand you will decrease the signal-to-noise ratio β†’ treatment effect / experimental error. In addition, you will increase your chance of finding a type II error because there is a less true experimental error than indicated by theΒ model

Analyzing a RCBD as a CRD depends on the variance attributable by the blocking structure.

Blocking is like signing a contract. Once you are in, you are in and these blocks are a usefulβ€Šβ€”β€Šand often necessaryβ€Šβ€”β€Štool depending on your research objective. Like with everything in a design, blocks have to be properly designed and accounted for in the statistical model. For instance, if an interaction between the outcome variable and the initial bodyweight of the animal is not expected, a Complete Randomized Design may do the job just fine. Here, initial body weight is a quantity and can be used as a covariate

No discernable differences between analysis methods (hint: this has to do with no missing data asΒ well).

Blocking works best if it represents the true population. However, if you have more interest in a specific part of the population, or you expect more variance in a specific part, it might be useful to include more blocks of thisΒ part

The effectiveness of blocking depends on its ability to be both representative &Β precise.

Now, what if more studies start with animals from the same population Sometimes multiple experiments start with animals from the same pool. The allotment of animals to the different experiments can have a large impact on the outcome. In most cases, a solution is possible that works for all affected experiments. However, this is not always the case, which can be seen in the graph below. Here, a single large animal pile was used to feed two separate studies. The allotment of the performance study influenced the allotment of the preference study.

How it went (left) and how it should have beenΒ (right).

All piglets, but the very smallest, are allotted randomly to the studies. Both studies are representative. However, especially the performance study is less precise. The number of replicates is the same, but the variance increases. The will have a negative effect on the power of theΒ study.

Now, we have discussed several times the importance of using a correct blocking strategy. The blocks need to be able to capture variance in such a way that the between-block-variance is maximized at the expense of the within-block variance. The way the blocks are set up is detrimental toΒ this.

Below you see an actual blocking strategy to the left, using the body weight at day zero as the blocking parameter. On the right, you see a simulated blocking strategy, using the distribution of body weight at day zero to inform the size and thus the number of blocks. It does not take great foresight to ascertain which study will prove to have a better blocking strategy.

The 10 blocks capture more variance than the 15 blocks. However, there is still a large set of variance included. Always beΒ careful.

A Randomized Complete Block Design can easily be created and optimized using PROC FACTEX. Below, you see a 2Β³ full-factorial across four blocks leading to 32 experimental units. PROC PLAN can of course accommodate here asΒ well.

Then, there is always the data step way of building nested datasets. In SAS this is childsplay.

Last, but not least, we can use PROC OPTEX to build aΒ RCBD.

Now, let's start analyzing an RCBD design. Below you see me trying to include a treatment*block interaction. Remember, an RCBD has a single replication per treatment*block interaction which means that including the treatment*block leaves no information to determine the residual variance. As you can clearlyΒ see.

If you want to analyze the interaction between treatment and block, you need to replicate the treatment within a block. However, this will make the blocks larger which could lead to less homogeneity within a block, so less homogeneity

Two blocks trying to capture variance by increasing between-block variance.

Hence, a Randomized Complete Block Design is a great designΒ if:

  1. there is enough variance to capture byΒ blocking
  2. the blocks are set up in a way that they are homogenous within and heterogeneous between.

If you have many treatments, it could be quite difficult to obtain homogenous blocks, as you can see in the colored barΒ here.

Hence, this is where a Randomized Incomplete Block Design (RIBD) can be of great value as it keeps the blocks homogenous by not having all the treatments in each block. A distinction can be made between a balanced and partially balancedΒ design.

The RIBD has four key characteristics:

  1. Each treatment level is equally replicated
  2. Each treatment DOES NOT appear in eachΒ block
  3. Each treatment appears the same number of times overΒ blocks
  4. Each pair of treatments appears in a block the same number ofΒ times
A Randomized Incomplete BlockΒ Design.

To build such a design, you need to address a build-in SAS macro and then use PROCΒ OPTEX.

A request for an RIBD having five treatments, and wanting four treatments in tenΒ blocks.
I have ten blocks containing four treatments each. Each treatment can be found in eight blocks enabling six pairwise balanced comparisons.
Macro’s requesting a Balanced Incomplete Block Design for five treatments, ten blocks, and our treatments perΒ blocks.

Of course, even though the Incomplete Block Design is balanced, it is no match for an RCBD. Nevertheless, if you do not have the room to accommodate the necessary sample size for an RCBD, I suppose the RIBD is the best way toΒ go.

Comparing RCBD vs RIBD. The estimates may seem completely different, which is due to random sampling, but the most important thing to notice is that the Confidence limits are bigger for theΒ RIBD.

Since we have a fully balanced incomplete block design it is a bit of a no-brainer that we can also create partially balanced incomplete block designs.Β Here:

  1. each treatment does not appear in eachΒ block
  2. each treatment does not appear the same number of times overΒ blocks
  3. each pair of treatments does not appear in a block the same number ofΒ times
A partially balanced incomplete blockΒ design.
Lambda is not an integer anymore meaning that each treatment comparison is not equally presented in eachΒ block
As you can see, comparisons differ in sample size, which was not the case for the Balanced Incomplete Block Design. Hence, why it was called balanced, and this design is called imbalanced.
The imbalance shows in the Degrees of Freedom available for each comparison.
This will reflect itself in the comparisons and mostly the confidence intervals.

Let’s say I have six treatments, four blocks, three treatments per block and I am mostly interested in comparing 1 vs 2, 1 vs 4, and 2 vs 4. A simple start would be by using PROCΒ PLAN.

My comparisons of interest have the lowest standardΒ error.

Below, you see an example of designing an RCBD using multiple steps of PROC FACTEX to build nesting of fixed and random components, starting with block, adding treatment, and then adding sex asΒ factors.

The dataset increases as I increases the number of nested variables.
In the end, I have ten blocks of which each contains four treatments and each treatment has four males and fourΒ females.

One can also easily ask SAS to transform full-factorial into an RCBD, of which you can see two examplesΒ below.

In summary, there are two major reasons for blocking a study: (1) practical, and (2) statistical.

PRACTICAL reasons forΒ Blocking

  1. Conditions cannot be kept constant in theΒ study.
  2. Divides experiment into mini-experiments with homogenous animals.
  3. The experiment is more manageable.

STATISTICAL reasons forΒ blocking

  1. Control variance by creating and accounting for created variance.
  2. Cancel-out block-to-block variation
  3. Estimate treatment differences better β†’ standard error only due to experimental error


Building and Optimizing Randomized Complete Block Designs using SAS was originally published in Towards AI on Medium, where people are continuing the conversation by highlighting and responding to this story.

Join thousands of data leaders on the AI newsletter. It’s free, we don’t spam, and we never share your email address. Keep up to date with the latest work in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming aΒ sponsor.

Published via Towards AI

Feedback ↓