That is going to be brief. I’ve to show at Sant’Anna this afternoon in Pisa, and for the final, on I don’t know, 4-5 yours I’ve been bother taking pictures one thing the place I used to be utilizing Callaway and Sant’Anna (CSDID) on a undertaking. And for those who look again over this Claude Sequence, CSDID is the commonest utilized econometric methodology I’ve been utilizing. My varied experiments in a paper on the minimal wage, in truth, are all forcing Claude Code to make use of CSDID. So it’s in that context I wish to say a couple of issues.
I’ve grow to be satisfied that we can’t when automating our analysis step away. This frequent Claude trope the place folks do all this planning after which let it rip, placing confidence in /expertise, is completely backfiring. I solely know that it’s from my very own self-experimentation, however I’ve documented outcomes that it’s. For one, I’m discovering on this paper of mine that Claude is definitely specification looking, and upon discovering a selected methodology that doesn’t contradict the immediate, solely writing that up. The one means I discovered was due to a remark that a particularly insightful MIT pupil got here as much as me and talked about with me which is that Claude is definitely storing all of his selections in a JSON. So I had the analyzed too, and positive sufficient, there’s far more below the hood than I used to be anticipating — like turning over a rock and discovering a a lot bizarre bugs.
In order that’s one factor.
But it surely’s not simply that. The coverage advice from these experiments is to pre-register. Possibly you pre-register like actually, but it surely’s not simply that. You must dictate up entrance the inhabitants estimand. You must actually write down in potential outcomes which aggregation you’ll be concentrating on. That is the “ahead engineering” method that we advocate for our “Distinction-in-Variations: A Practitioner’s Information” by Baker, et al, 2026 within the latest subject of Journal of Financial Literature. That’s the very first step, and albeit, if we’re going to rely moreso on Claude Code and codex for the coding, I really suppose to be completely trustworthy clear articulation of the inhabitants estimand couldn’t be extra vital.
Which ends up in my second thought. I had this downside with CS. I’m on a unique machine than regular, I had put in “did” from CRAN. I used to be getting odd outcomes on the 2x2s. They have been spitting out NAs, and I really solely caught it as a result of I made a decision to do one thing in a different way. And so to do it, I had simply gone systematically via a guidelines I educate usually and which is within the new e-book within the “advanced diff-in-diff” chapter.
Nicely, I knew these NAs didn’t make sense. See, CS is “4 averages and three subtractions” with weights on the means. So the one factor that ought to be wanted is consequence values in every cell, and baseline covariates in every cell. I used to be utilizing “regression adjustment” as a result of the variety of handled models per cohort was too small to keep away from excellent separation on the propensity rating, however I wished to make use of the covariates to fulfill a parallel developments assumption I’d already dedicated to up entrance.
Nicely, I’ve taught Brant and Pedro’s paper about ten billion instances since 2021. I might fortunately say I’m significantly effectively versed in it for being a non-econometrician. And possibly realize it higher than the modal econometrician even. I do know all of the little curves within the street. I do know precisely the place the potholes are. I do know, as an example, that there shouldn’t be a coefficient at g-1 as a result of I’ve robust opinions about it. Until you’ve stronger opinions than me about it to do the brief distinction, the most probably state of affairs is that one merely doesn’t know. See it’s human capital. It’s not even talent. It’s human capital from repeated time use. See human capital is mathematical — you do that sufficient instances, you’re going to get human capital. It’s why a few of us get bizarre human capital in issues. You’ll in some way grow to be an skilled in tips on how to beat a selected boss on some inconceivable online game since you needed to do it 10,000 instances unsuccessfully earlier than you kind of memorized precisely the path to take. That’s mainly how CSDID is for me. I’m not saying I’m on the chief board; I’m not saying I’m in first place. I’m saying I’ve human capital in it.
And that meant I knew these NAs made no sense.
So I spent 4 hours on this new machine with Claude Code going time and again and over debugging it. He will need to have made a mistake. The panel dataset have to be an issue. The best way he’s producing time numbering, because it was not a “actual time measure”. As an illustration, it was in weeks we had artificially made for which he made a man-made week counter. Possibly that was it. Nope that was not it. Possibly cells have been empty. Nope. Possibly covariates have been lacking for the never-treated models solely, in some way. Nope.
Till this morning I remembered — you realize, I’ve a imprecise reminiscence that Brant had a unique set up on his GitHub.
You possibly can both set up did from CRAN or “the most recent model” from GitHub. And as this was a unique machine, I puzzled, “I wager you it’s not the identical because the one on my machine. I wager you it put in from CRAN”. It did. I had two Claudes open, speaking to one another, every attempting various things and nothing labored. And never solely that, they have been operating down conjectures completely pleased with the conclusions and people conclusions have been flawed. Every time, they have been flawed. The entire time it was simply the flawed model. And as soon as we mounted it, it was wonderful.
Right here’s what occurred although. I had not too long ago, most likely as a result of I wanted to develop content material for these workshops on “utilizing Claude Code for analysis” significantly tied to diff-in-diff that I had begun to create a dashboard. I known as it “gtd” after “getting issues executed” by David Allen. You’ll find it right here. It’s like a harness, and it’s a piece in progress. It isn’t executed. But it surely’s the muse.
Nicely, the dashboard is nice as an idea. I haven’t pushed the brand new adjustments but, however I’ve made the figures and tables be “playing cards” you may click on and so they get huge and so they flip round to the place you could find the code that created them, and it actually reveals the code. I’ll push it quickly so you may see it, however mainly I’ve been attempting to construct extra ‘psychological verification’ into the work move. I name it ‘psychological verification’ to separate it from some formalized verification as a result of traditionally once I would code, even when there have been coding errors within the code, I all the time knew what made what. I knew the place issues originated. It was the “epistemological feeling” or one thing like that. It was a type of warranted perception as a result of it was related to feeling and reminiscence. It was related to confidence.
Nicely, with Claude Code I don’t get any of that feeling.
I believe a whole lot of supervisor sorts, although, don’t get that. They use RAs to do a whole lot of this work, and they also additionally don’t those self same “epistemological emotions” or “confidence” or what have you ever. That type of “verification” that comes from manufacturing. And see since brokers cleanly demarcate and separate manufacturing from verification, the psychology of verification is misplaced since claude does it for us.
I’m a giant believer in what I’m about to say and it’s this. Claude Code and different brokers clear up many issues and create new ones, and our job is simply to resolve the brand new ones. That’s it. It’s not some dystopian horror present that I’ve misplaced the epistemological emotions from coding myself. It simply means I personally have a brand new downside insofar as I would like epistemological emotions for accuracy. And I do. Which is why I’m leaning extra closely into, not /expertise, however structured analog workflows.
Structured analog workflows are checklists that you just undergo. Atul Gawande has an ideal e-book known as “The Guidelines Manifesto”. I wish to say it was a bestseller when it got here out however I’m undecided. Anyhow, the factor about it’s he claims that checklists have been instrumental in lowering mortality in surgical procedures and crashes in aviation — just by having the operator go systematically via a guidelines and never going ahead till the sooner step is accomplished. Individuals who have attended my workshops know that I normally conclude the dialogue of staggered adoption by having a 9-step guidelines which I name “Pedro’s Guidelines” as a result of Pedro Sant’Anna had it in a deck that he offered at Amazon as soon as. It’s stuff like this:
-
Title your causal estimand utilizing potential outcomes, populations, and weighted averages. Don’t simply “ATT”. Say the weights and the inhabitants and categorical it as binary comparisons.
-
Make a desk of handled models by cohort with pattern shares. Make it lovely. Embody the counts on the by no means handled too. We must always all the time be capable of reply a easy query like “what number of handled models are within the 2005 cohort?” It issues as a result of the becoming of the propensity rating off covariates is mainly needing 7-10 “occasions” per covariate. So for those who solely have 4 handled counties in a cohort, and 5 covariates, you can not match that propensity rating. You’ll must make some onerous selections. Ideally earlier than you start as a result of most of the CSDID packages is not going to produce the logit coefficients so that you can examine — it’s all below the hood. And a few of them really simply drop the covariates in order that the utmost probability converges and gained’t even let you know! LOL.
-
Plot the rollout utilizing Yiqing Xu’s “
panelview”. Why panelview? Why not? -
Have a technique for choosing covariates for satisfying conditional parallel developments (learn our part 4.2 within the JEL about conditional parallel developments carefully). If the covariates trigger Y(0) developments, or the parameters on the covariates impact on Y(0) change over time and you’re imbalanced, then it’s going to mechanically break parallel developments. So that you want them. I’ve a technique; I’ll clarify in one other substack what it’s. However for now, you want one. After which once you’ve picked them, you have to make a easy desk of means by handled and management and calculate the normalized distinction in means (which is (X1-X0)/sqrt(1/2[var1 + var0]), and if that’s larger than in absolute worth 0.25 you gotta embody it, and if it isn’t you’re good to disregard as a result of they’re balanced “sufficient”. Which once more issues for the propensity rating since you actually pay for these covariates in that propensity rating resulting from that “7-10 occasions per covariate” factor.
-
Plot the evolution of the end result means by cohort together with the by no means handled. Make it lovely. Why? Nicely for one which’s going to be the place you see some errors within the information. If never-treated has huge lacking values or isn’t there in any respect — guess what? One thing is tousled. However you can also inform you probably have choice on ranges (which is okay) or choice on Y(0) developments (which might not be wonderful). And who doesn’t love fairly photos, anyway?
-
After which do CS, trustworthy intervals, falsifications with occasion research (utilizing
long2orcommon baselineplease for the love of God).
I skipped some steps; you get my level. Nicely, right here’s the deal. I really feel so assured with csdid that I can spot issues simply by wanting on the output of all that. And I can rule stuff out too. If I see means in that consequence evolution, I do know I can get “4 averages and three subtractions”. If I’ve excessive values on X, I most likely must z-score it. CSDID’s R bundle ‘did’ really does z-score, however as I confirmed in an earlier put up, not all do. Of the six I reviewed, not all do.
The place’s my level? It’s this:
First and most significantly, the returns to econometrics data with Claude Code is definitely larger not decrease. As a result of automation doesn’t imply verification. And you need to confirm these outcomes. You must insist on zero error. Insist on zero error. Claude is not going to get into bother when there are errors — we’ll get into bother. We’ve got to insist on zero error. And every individual’s fashion of analysis is totally different. All people’s acquired a unique workflow, a unique manufacturing perform, a unique set of strengths and weaknesses. So you need to determine a verification methodology that fits you and also you alone. Which means what works for somebody might not give you the results you want. Return and browse Ricardo on comparative benefit and imagine it. There’s like 8 inputs within the manufacturing features of scientific output. We every have them and so they vary from zero to some giant quantity and if it’s zero, we want a way or a coauthor who doesn’t have a zero and we have to work effectively collectively. However even non-zero values doesn’t imply that you just want Claude Code in the identical means as another person, and it undoubtedly doesn’t imply that their fashion of verification is what is going to give you the results you want.
The objective is to not have the proper expertise that we simply move round. The objective is zero error. That’s the objective, and for those who make that the constraint and every little thing else and endogenous selection variable, you’ll get there. You simply can’t get the cart earlier than the horse. The constraint is zero error. Zero error is non-negotiable. Say it, imagine it, follow it. After which you’ll (and so will I) develop the workflows that assist you to obtain it.
I’ll cease there, however I simply wished to say this as a result of it has been constructing and constructing inside me. It has been constructing as a result of I’ve felt like I used to be wrestling a greased pig with Claude Code simply over csdid. And I solely saved profitable that battle as a result of I had substantial human capital within the econometrics. I can’t think about if I used to be attempting to make use of Claude Code for discrete selection for BLP. I might be shedding! I wouldn’t have the primary foggiest thought of what a real or false wanting reply is.
So I believe it’s insane when folks say we will automate utilized econometrics. You already know what I believe once I hear that? I believe I’m listening to somebody who really doesn’t do precise empirical analysis or doesn’t notice that there are tons of bizarre errors that Claude Code makes which are so dissimilar to the sorts of errors we make. They aren’t coding errors. They’re reasoning errors and they’re very onerous to tease out if you find yourself not within the driver seat.
