Sunday, December 7, 2025

Econometric Sense: Blocking and Causality


In a earlier publish I mentioned block randomized designs. 

Duflo et al (2008) describe this in additional element:

“Because the covariates for use should be chosen upfront with the intention to keep away from specification looking out and knowledge mining, they can be utilized to stratify (or block) the pattern with the intention to enhance the precision of estimates. This system (¯rst proposed by Fisher (1926)) includes dividing the pattern into teams sharing the identical or related values of sure observable traits. The randomization ensures that therapy and management teams might be related in expectation. However stratification is used to make sure that alongside necessary observable dimensions that is additionally true in apply within the pattern….blocking is extra environment friendly than controlling ex publish for these variables, because it ensures an equal proportion of handled and untreated unit inside every block and subsequently minimizes variance.”

In addition they elaborate on blocking if you find yourself focused on subgroup evaluation:

“Other than decreasing variance, an necessary cause to undertake a stratified design is when the researchers have an interest within the impact of this system on particular subgroups. If one is within the impact of this system on a sub-group, the experiment will need to have sufficient energy for this subgroup (every sub-group constitutes in some sense a definite experiment). Stratification in accordance with these subgroups then make sure that the ratio between therapy and management items is decided by the experimenter in every sub-group, and might subsequently be chosen optimally. It’s also an assurance for the reader that the sub-group evaluation was deliberate upfront.”

Dijkman et al (2009) focus on subgroup evaluation in blocked or stratified designs in additional element:

“When stratification of randomization relies on subgroup variables, it’s extra doubtless that therapy assignments inside subgroups are balanced, making every subgroup a small trial. As a result of randomization makes it doubtless for the subgroups to be related in all facets besides therapy, legitimate inferences about therapy efficacy inside subgroups are prone to be drawn. In publish hoc subgroup analyses, the subgroups are sometimes incomparable as a result of no stratified randomization is carried out. Moreover, stratified randomization is fascinating because it forces researchers to outline subgroups earlier than the beginning of the examine.”

Each of those accounts appear very a lot in step with one another by way of serious about randomization inside subgroups making a mini trial the place causal inferences could be drawn. However I feel the important thing factor to contemplate is they’re referring to comparisons made WITHIN sub teams and never essentially BETWEEN subgroups. 

Gerber and Inexperienced focus on this in one in all their chapters on evaluation of block randomized experiments :

“No matter whether or not one controls for blocks utilizing weighted regression or regression with indicators for blocks, they key precept is to match therapy and management topics inside blocks, not between blocks.”

After we begin to evaluate therapy and management items BETWEEN blocks or subgroups we’re primarily decoding covariates and this can’t be finished with a causal interpretation. Inexperienced and Gerber focus on an instance associated to variations within the efficiency of Hindu vs. Muslim colleges. 

“it might simply be that faith is a marker for a bunch of unmeasured attributes which can be correlated with instructional outcomes. The set of covariates included in an experimental evaluation needn’t be an entire record of things that have an effect on outcomes: the truth that some elements are ignored or poorly measured isn’t a supply of bias when the purpose is to measure the typical therapy impact of the random intervention. Omitted variables and mismeasurement, nonetheless, can result in sever bias if the purpose is to attract causal inferences concerning the results of covariates. Causal interpretation of the covariates encounters all the threats to inference related to evaluation of observational knowledge.”

In different phrases, these sorts of comparisons face the the identical challenges associated to decoding management variables in a regression in an observational setting (see Keele, 2020). 

However why would not randomization inside faith enable us to make causal statements about these comparisons? Let’s take into consideration a special instance. Suppose we wished to measure therapy results for some form of instructional intervention and we have been focused on subgroup variations within the end result between private and non-private excessive colleges. We might randomly assign therapies and controls throughout the public faculty inhabitants and do the identical throughout the non-public faculty inhabitants. We all know total therapy results could be unbiased as a result of the college sort could be completely balanced (as a substitute of balanced simply on common in a totally random design) and we’d count on all different necessary confounders to be balanced between therapies and controls on common. 

We additionally know that throughout the group of personal colleges the therapy and controls ought to at the very least on common be balanced for sure confounders (median family earnings, trainer’s schooling/coaching/expertise, and maybe an unobservable confounder associated to pupil motivation). 

Lets say the identical factor about comparisons WITHIN the subgroup of public colleges. However there isn’t any cause to imagine that the handled college students in non-public colleges could be similar to the handled college students in public colleges as a result of there isn’t any cause to count on that necessary confounders could be balanced when making the comparisons. 

Assume we’re taking a look at variations in first semester school GPA. Possibly throughout the non-public subgroup we discover that handled handled college students on common have a primary semester school GPA that’s .25 factors greater the comparable management group. However throughout the public faculty subgroup, this variations was solely .10. We are able to say that there’s a distinction in outcomes of .15 factors between teams however can we are saying that is causal? Is the distinction actually associated to highschool sort or is faculty sort actually a proxy for earnings, trainer high quality, or motivation? If we elevated motivation or earnings within the public colleges would that make up the distinction? We would do higher if our design initially stratified on all of those necessary confounders like earnings and trainer schooling. Then we might evaluate college students in each private and non-private colleges with related household incomes and academics of comparable credentials. However…there isn’t any cause to imagine that pupil motivation could be balanced. We will not block or stratify on an unobservable confounder. Once more, as Gerber and Inexperienced state, we discover ourselves in a world that borders between experimental and non-experimental strategies. Merely, the subgroups outlined by any specific covariate that itself isn’t or can’t be randomly assigned might have totally different potential outcomes. What we are able to say from these outcomes is that faculty sort predicts the end result however doesn’t essentially trigger it.

Gerber and Inexperienced expound on this concept:

“Subgroup evaluation needs to be considered exploratory or descriptive evaluation….if the purpose is just to foretell when therapy results might be giant, the researcher needn’t have a accurately specified causal mannequin that explains therapy results (see to elucidate or predict)….noticing that therapy results are usually giant in some teams and absent from others can present necessary clues about why therapies work. However resist the temptation to assume subgroup variations set up the causal impact of randomly various one’s subgroup attributes.”

References

Dijkman B, Kooistra B, Bhandari M; Proof-Primarily based Surgical procedure Working Group. Tips on how to work with a subgroup evaluation. Can J Surg. 2009;52(6):515-522. 

Duflo, Esther, Rachel Glennerster, and Michael Kremer. 2008. “Utilizing Randomization in Improvement Economics Analysis: A Toolkit.” T. Schultz and John Strauss, eds., Handbook of Improvement Economics. Vol. 4. Amsterdam and New York: North Holland.

Gerber, Alan S., and Donald P. Inexperienced. 2012. Discipline Experiments: Design, Evaluation, and Interpretation. New York: W.W. Norton

Keele, L., Stevenson, R., & Elwert, F. (2020). The causal interpretation of estimated associations in regression fashions. Political Science Analysis and Strategies, 8(1), 1-13. doi:10.1017/psrm.2019.31

Related Articles

Latest Articles