Diff-in-diff may be written down six methods!

June 4, 2026

80

The opposite day I wrote a brief (for me) substack put up about when covariate imbalance throughout therapy and management will mechanically break parallel tendencies. Earlier than I transfer into covariates, I needed to only lay out some issues connecting diff-in-diff to regressions. I need to as a result of it’ll make my dialogue of covariates and bias simpler.

So whereas this should do with linear regression, but it surely received’t should do with staggered. I’ll do all of this with easy 2x2s estimated with a regression mannequin. I simply need to make certain we’re all first on the identical web page that diff-in-diff is a 2×2, and since that’s 4 averages and three subtractions, there’s two methods to do it manually, and 4 regressions too.

However earlier than I do, I flipped three cash and two got here up tails, so this one received’t be paywalled. Thanks once more for all of your help!

First do you know that diff-in-diff will not be a regression? Somewhat it’s “4 averages and three subtractions”. You are taking two teams, noticed earlier than and after some intervention, one in every of whom was uncovered and the opposite who wasn’t, and also you subtract them so as.

However do you know there are literally SIX methods — depend them six — other ways to do the equivalent diff-in-diff calculation? 4 with a regression, and two with “4 averages and three subtractions”. Let me present you.

First variations, then distinction

That is the usual 2×2. I’ll let the therapy be faculty. And the therapy group will probably be faculty educated employees, and the management group will probably be highschool (solely) educated employees. And the result, Y, will probably be their annual earnings. The traditional 2×2 is:

(δ^{2×2}=(Y_1,put up−Y_1,pre)−(Y_0,put up−Y_0,pre))

Let’s put it in a desk so you possibly can see it. You subtract left to proper for every group to get their first distinction. You then go down and subtract these two first variations to get the diff-in-diff estimate.

It isn’t but causal as a result of it’s by no means causal till you’ve gotten changed noticed outcomes with their corresponding potential outcomes. As soon as we try this, then we are able to see the causal time period and the bias time period. For now it’s only a easy calculation and nothing extra.

I’ll present it to you now utilizing the Card and Krueger outcomes from their minimal wage paper. Make certain you all the time go “therapy first” once you do that. For first variations, it’s “after minus earlier than” for the handled, then the management, then subtract the management first distinction from the therapy group distinction. See if yow will discover on right here what the counterfactual development is for New Jersey.

Group variations, the distinction

Look intently at that desk. Watch me now under. As a substitute of going left to proper, we are able to go down and take group variations. Subtract the management group from the therapy group for each the earlier than interval and the after interval. Then go left to proper with that third subtraction and voila — you arrive on the similar quantity. Why is that? Watch under.

(δ^{2×2}=(Y_1,put up−Y_0,put up)−(Y_1,pre−Y_0,pre))

See, for the reason that 2×2 is only a sequence of subtracted phrases, you possibly can transfer them round. And due to this fact as a result of you possibly can transfer them round, then the order doesn’t matter. The indicators on them matter in fact, however that’s what I imply by transferring them round — you’re taking the indicators with you. And once more right here is the Card and Krueger illustration of “group variations first”.

Discover that the 2×2 calculation is equivalent. Now see should you can determine the “parallel group distinction” is, because the parallel tendencies assumption keep in mind is simply one other 2×2, however on Y(0) as an alternative of Y, so it’s in right here too. And keep in mind — the “parallel distinction” may have the very same kind because the 2×2 you calculated, so see should you can’t put what it’s once we specific it as group variations first.

So there are two methods to put in writing down the 2×2. However as most individuals don’t consider the diff-in-diff as a 2×2 within the first place — they usually see it as a regression, and a “two manner mounted results” regression at that — I believe that doesn’t register as that large of a deal.

However do you know there’s 4 regressions that can calculate that actual diff-in-diff quantity? I imply by that that you could estimate the 2×2 utilizing 4 completely different OLS regression specs. Let me present you them.

Saturated regression (in dummies)

The primary specification is easy. You create a therapy dummy equaling one for everybody within the therapy group mixed and equaling zero for everybody within the management group mixed. So this isn’t panel mounted results. You additionally create a dummy for the put up therapy interval. After which that is the regression specification:

(Y_{it} = beta_0 + beta_1 D_i + beta_2,textual content{Put up}_t + delta,(D_i occasions textual content{Put up}_t) + varepsilon_{it})

The coefficient on the delta is numerically equivalent to each of the handbook 4 averages and three subtractions. We did. Each. Not simply the primary distinction one — it’s additionally numerically equivalent to the group variations one.

Twoway mounted results regression

Now as an alternative of a single therapy dummy, put in a dummy for everybody — unit mounted results in different phrases. Then run this.

(Y_{it} = alpha_i + lambda_t + delta,(D_i occasions textual content{Put up}_t) + varepsilon_{it})

the place alpha and lambda are unit and time mounted results, respectively. And once more, the delta coefficient is numerically equivalent to each of the handbook 2×2 calculations in addition to the earlier regression we wrote down.

First distinction regression

Subsequent go in steps. First take a primary distinction by subtracting the baseline end result from each items equation so that you’re left with a cross-section and your end result is now “distinction in outcomes”. Then regress that onto a therapy dummy (no interplay).

(Delta Y_i = beta_0 + delta, D_i + varepsilon_i)

That’s numerically equivalent to each 2×2 handbook calculations, and each of the earlier regression specs as properly, however discover it went into two steps. You are taking a primary distinction, then you definately regress the primary distinction on to a therapy dummy.

Group distinction regression

Lastly, take group variations (therapy minus management) so that you just finish with two group values equal to Delta Y_t. Then regress that onto a put up dummy.

(Delta Y_t = beta_0 + delta,textual content{Put up}_t + varepsilon_t)

And right here is the final one. This one you do the “group distinction first” (i.e., delete the imply end result for the management from the therapy within the pre and put up interval). Collapse it to time, then regress that on to the put up dummy.

So that you see, it’s not precisely correct imo to confer with diff-in-diff as regression as a result of it’s truly 4 regression specs, and it’s two completely different handbook calculations. I encourage you to run this code to persuade your self.

And should you needed the weighted model:

So the following diff in diff put up will decide up on this by analyzing the completely different biases of linear regressions controlling for covariates. However right now was merely to point out that I’ll be working with the 2×2, I’ll be working with regressions that estimate the 2×2, and whereas I may do it any variety of methods, I’ll be utilizing for simplicity the saturated regression (the primary spec above), as that’ll be extra acquainted I believe to folks.

However you by no means know! My hunch is it’s a powerful perception for some, notably folks nearer in age to me, that diff-in-diff a synonym for regression, and a selected regression (normally TWFE), however as I present, diff-in-diff is a 2×2 that may be estimated with regressions — simply not technically the identical regression specification. But weirdly, when you write them down otherwise, they’re numerically equivalent on the purpose estimate. (Inference I’ll depart for an additional day if that’s okay).

Diff-in-diff may be written down six methods!

Related Articles

5 Key Ideas Behind Agentic AI Each Engineer Should Perceive

Learn how to execute queries in parallel utilizing EF Core

Language Mannequin Hallucination Analysis with GraphEval

Latest Articles

5 Key Ideas Behind Agentic AI Each Engineer Should Perceive

Learn how to execute queries in parallel utilizing EF Core

Language Mannequin Hallucination Analysis with GraphEval

Intel simply posted its greatest progress in 15 years – and burned billions to make it occur

One in every of NASA’s Most Necessary Deep Area Observatories Hit by Spanish Wildfires