The next is just a few ideas I had about Claude Code primarily based on spending a day engaged on an outdated undertaking that I had executed a ton of labor on shortly after discovering Claude Code in mid November. I had been so amazed by what I discovered in mid November that I instantly turned to this different undertaking, and received a ton of labor executed. I then needed to write up a draft and a deck to current it. The draft was insanely lengthy, by no means ending tables and figures, and I by no means completed it as a result of I needed to transfer into the top of the semester exams. However this week is spring break at Harvard, and I’ve been slowly knocking stuff out. So I wrote this final night time earlier than mattress and am posting it this morning.
Thanks once more everybody for all of your help. I respect everybody’s enthusiastic response to me speaking about Claude Code and causal inference on right here — each now, but in addition over the previous few years. I’ve actually loved the motivation to maintain finding out tougher and continue learning, and making an attempt to get higher at speaking what I do know to different folks. And this substack is partly the place I do it. So thanks once more. Contemplate changing into a paying subscriber! I set the value to the bottom potential value you possibly can for substack ($5/month) and hope that that may be reasonably priced. It’s a labor of affection!
This may increasingly sound like I’m giving AI a facet look however I’m not. I stay ceaselessly grateful for what I assume is software program. And but anytime the next occurs, and it occurs often, I’m inclined to pay attention to it, and attempt to articulate it. All the things I say appears true sufficient that it could apply to anybody and everybody, but when not, I do assume it applies to me.
It all the time begins with The Matrix, a timeless basic. There’s a scene the place Neo lies down in a chair with a cable jacked into the again of his cranium. He writhes and after about ten seconds, he opens his eyes and says, “I do know kung fu.” Later he fights Morpheus and exhibits him. It’s a beautiful a collection of scenes now because it was then in 1999 once I noticed it within the theater with my buddies.
When ChatGPT-4 got here out within the spring of 2023, it was 25 years after the film got here out and it felt like a promise I’d been given as an adolescent can be fulfilled. Which means, ChatGPT-4 felt like I’d turn into like Neo. Not a lot the promised messiah who would lead a resistance in opposition to the machines as simply that I may study something I needed with none effort. An assurance that I’d by no means should work exhausting to study one thing. I used to be simply going to put down and get plugged in and all of the issues I needed to know would come to me with none effort.
Nothing wanting being given the facility of flight may very well be higher fitted to my character. I used to be a lifelong lover of studying and tremendous powers, and the thought that I may fill this mind, not with information, however precise expertise was deeply enticing. It had all the time taken me twice so long as my classmates to study economics and econometrics, however I had all the time been the one amongst my classmates who needed to color the ceiling of the cathedrals with economics and econometrics. In order that hole in want and talent all the time needed to stuffed with sweat and exhausting work. However as that point use all the time got here with a hefty price ticket, which was that to achieve the talents meant to delay my artistic work till tomorrow, as in the present day can be spent studying, then given how a lot o wanted to know, it felt just like the work would by no means generally come.
So I keep in mind having this sense with ChatGPT-4 that I may simply know the issues from then on, and I may know them now, in the present day, with none exhausting work. Need to know the way to arrange a Docker container? Growth. Need to perceive the fundamentals of optimum transport principle? Achieved. Your complete corpus of human data, all the talents of being an economist, uploaded into my mind, no sweat required.
What I believe now’s that there stays now as a lot as there ever was one reality which is what there may be not now any greater than then such a factor as a free lunch. There is no such thing as a free lunch. Gaining expertise and data all the time requires time. It all the time comes at a price.
Right here is the factor about studying: you possibly can’t do it with out breaking a sweat. No matter it’s that I’m to say that AI does for me in my quest for private progress as an economist, I don’t assume the right metaphor is of me, laying again, reclining in a chair, with a rod caught within the base of my cranium, having karate downloaded instantly into my cerebral cortex. That isn’t the metaphor as a result of that metaphor exhibits an individual passive, partaking with AI whereas they’re virtually asleep.
I’m like 99% positive it’s nearer to a bodily legislation to say that simply as you possibly can’t construct muscle with out resistance, you can not achieve data with out resistance. You possibly can’t construct understanding with out wrestle. You can’t develop with out a battle. And normally for the most effective issues, will probably be a bloody battle.
An AI agent can take away the wrestle, and it will probably completely get cognitive duties accomplished for you. There is no such thing as a doubt about that. You possibly can accomplish cognitive objectives, full cognitive duties, and achieve this properly, and never break a sweat. However that’s not the identical as you studying. You possibly can full cognitive duties and concurrently not study. And when that occurs it’s one in all two issues. Both you’ve gotten turn into excellent at pushing buttons, during which case the button pusher could also be over educated for that job reality be instructed. Or they turn into the very blind main the very blind, with out realizing it.
Usually when somebody says this stuff, they are saying them from a spot of outright rejection of AI, however I don’t assume that’s the case for me. I nonetheless am optimistic, each about AIs utility for me and society. However I additionally really feel, similar to I did the primary day, that AI is just like the siren, and if I can’t work out the way to shut my ears to all its temptations, and simply proceed on the identical lengthy march I’ve all the time been on, then I’m going to finish up crashed in opposition to the rocks.
I consider that AI works profoundly properly when used within the areas the place you have already got substantial experience, and it really works in an extremely jagged and unsure method when utilized in areas the place you haven’t any precise comprehension. Which signifies that my very own investments in my very own expertise stays essential to getting probably the most out of it.
I’ve a paper that makes use of Callaway and Sant’Anna’s difference-in-differences estimator, which by now I do know fairly properly. However I used to be making use of it to one thing uncommon. I had individual-level employee knowledge the place to make use of CS. I needed to re-envision what “time” means whereas sticking to this staggered adoption framework. I’m not going to get into the small print right here, however simply understand it was a wierd sufficient software that the code couldn’t simply be lifted off the shelf. It needed to be constructed rigorously however since I knew what I needed, I knew I may do it with AIs assist.
The issue was, I hadn’t touched this undertaking since 2025. It was a type of issues on my plate that I stored that means to get again to, and as coauthors stored asking for it, and this week was spring break, I lastly sat right down to clear it off. I opened the listing and instantly felt that sinking feeling. The code appeared method longer and chaotic than I remembered. As an illustration, it was a little bit of a medley and a mixture of R and Stata information. Graphics that didn’t look proper. Which meant I hadn’t executed my due diligence to get all of the kinks out, as nowadays I don’t tolerate even the slightest irregularity in graphics, since for the primary time, I’ve somebody or some factor that can repair it for me.
However again to the undertaking folder. It was a sprawling folder construction that had clearly been used and reused for ten totally different functions. I may inform that past-me had gotten so much executed utilizing Claude Code, however I may additionally inform it was proper on the very begin of my utilizing it, again once I was nonetheless determining the way to work with it. The code had that feeling of formidable concepts with questionable execution, and never sufficient group, which in my life had all the time been the recipe for catastrophe.
So I began utilizing Claude Code to type via all of it. I instructed it: confirm that each desk and determine within the manuscript comes from replicable code, then replicate that code in R. That’s it. Don’t rewrite the paper. Don’t reorganize the listing. Simply affirm the pipeline.
The very first thing Claude did was run a code audit. As a very long time had handed and I clearly had by no means executed a code audit, I used to be nervous. I used to be particularly nervous although when Claude grew to become instantly satisfied that my adaptation of Stata’s csdid command had not executed what it ought to have executed since he couldn’t replicate it both utilizing the R command or manually in R.
It claimed that it had discovered a scenario the place one team of workers was coded as “by no means handled” once they have been, in reality, ultimately handled. That didn’t instantly appear potential to me as out of all potential errors I may make, that one appeared unlikely given the entire level of CS is to not try this. However Claude was completely sure that this was the supply of the contamination and consequently your complete code must be scrapped and began over.
And in a single sense he’s proper. If I had miscoded this bizarre model of CS by having an already handled group as a management, then I’d be defeating your complete goal of utilizing CS within the first place as CS is designed to not try this.
So it was an inexpensive concern. The form of factor that may sound fully proper in a code evaluation. And I undoubtedly felt sick inside on the thought I had made such a primary basic mistake.
However one thing felt bizarre about it. Perhaps it was simply speaking so quick, however I needed to only sit and purpose collectively a bit longer. So I stored pushing again. I instructed Claude he was complicated certainty with a conjecture and that he wanted to relax for a second. Beneath no situations is he to maneuver on. He should confirm his conjectures for me at the least three other ways, and since we had csdid, and I knew it labored, then we had a floor reality to all the time examine.
As a result of I did know these things just about just like the again of my hand, I really feel snug asking Claude to undergo a collection of steps, versus him making up his personal steps and strolling me via them. And with diff-in-diff, since I do know the calculations properly, I normally need issues executed with borderline pencil and paper. Old-fashioned econometrics.
And he can try this. He can do old fashioned econometrics. He can take 4 averages and subtract them as long as I take him via it. As long as I can grade his work. As long as I understand how to acknowledge the issues in his work.
A whole lot of econometrics might be executed with pencil and paper when you actually can distill it to probably the most primary model of itself. You simply should strip away quite a lot of the extraneous stuff to get there typically, however many occasions it’s potential. So I typically try this. I’ll make a dataset with 4 or 5 observations and attempt to manually do no matter it’s that the estimator is doing, as a result of I determine if I can’t do it by hand, then virtually definitely I’ll study one thing that can normally remedy no matter drawback I used to be having. In order that’s what I did right here. I stored having him simplify, calculate and examine.
At first that concerned stripping away the irrelevant issues, equivalent to covariates. If he couldn’t with out bizarre adapting of CS not get the identical collection of ATT(g,t)s as you get from csdid with out covariates, then that’s it — the issue wasn’t me, it’s most likely him now.
Lengthy story brief, by forcing him to get right down to the fundamentals, which I knew properly, to maintain drilling right down to probably the most primary model of what we have been engaged on, he ultimately discovered his personal mistake. His mistake was that your complete time, his “guide” Callaway and Sant’Anna implementation had by no means even been computing a difference-in-differences within the first place. He’d been going via all this forwards and backwards with me and had solely been calculating the between variations — handled imply minus management imply — versus the between distinction within the first variations. He had been doing a cross-sectional comparability and calling it CS. He’d been doing it within the context of this staggering surroundings, so I assume he was distracted, nevertheless it wasn’t even actually an error to make that mistake. I imply that was a pure zero on the examination. That was downright embarrassing. He is aware of Cs too is the factor! The strategy is actually referred to as “difference-in-differences”! There’s a distinction that you simply distinction! However for some purpose on this present day, he didn’t understand it.
There have been different indicators I ought to have caught earlier. At one level Claude was satisfied the estimated results have been invalid as a result of the code wasn’t utilizing the “common baseline” choice. However the common baseline solely issues for pre-treatment coefficients — each post-treatment ATT in Callaway and Sant’Anna makes use of the identical lengthy distinction calculation from the fastened t-1 baseline. I do know this as a result of I train this always.
He was satisfied the issue needed to do with this C+ plugin that R was utilizing for calculations which sounded sensible and fancy sufficient of a narrative that I’d’ve believed it have been it not within the one space I felt like I had substantial talent. That story doesn’t clarify something scuffling with taking a imply for a gaggle. It sounded extra like, to me, that he was making a basic mistake, that possibly he was getting the advanced aggregations proper however one thing extra primary incorrect. Which he was
And the phrase factor is, Claude additionally know this. He is aware of what diff-in-diff is. At a deep degree, he is aware of it. But it surely’s additionally the case that he generally is aware of this. The issue is that no matter whether or not he really is aware of it, Claude mentioned it with precisely the identical confidence both method.
I’ve seen this sample earlier than — each inside me and with another person. An individual who had attended one in all my workshops as soon as referred to as me on Zoom, excited to share one thing he’d discovered from a reasoning mannequin. He mentioned double-robust estimation helps you to use totally different covariates within the consequence regression than within the propensity rating mannequin. I had apparently instructed some those that it is best to use the identical covariates in each, and he needed to push again on me.
I assume it wasn’t incorrect, per se. Double sturdy simply requires one of many fashions, not each, to be right. However nonetheless, it struck me as unusual as a result of the function of covariates in diff-in-diff is to impute counter factuals via the conditional parallel tendencies assumption. For those who want the covariates for that, why are you shifting them into and out of the fashions otherwise? Presumably you want them to fulfill conditional parallel tendencies, which each the end result regression mannequin and the propensity rating mannequin used for his or her calculations to be proper within the first place.
I instructed him I wasn’t positive about double sturdy practices basically, however I had most likely been speaking about Sant’Anna and Zhao (2020) particularly, the place the doubly-robust estimator has a specific construction and when you technically can use totally different covariate units (I imply it’s a free nation — you possibly can technically do no matter you need, particularly when issues are executed in two levels), it’s not clear why you’d in case your purpose is satisfying the conditional parallel tendencies assumption which want all of these covariates within the first place to do.
So then I checked out his code and noticed what had really occurred: the reasoning mannequin had instructed him to only embody propensity rating variables as covariates inside a two-way fastened results regression. They weren’t getting used as weights utilized to the means in his code, initially. And he wasn’t becoming an consequence regression mannequin regressing the primary differenced consequence into baseline covariates for the management group anyplace. He was simply “controlling for” covariates generally twice and generally as soon as — inside a propensity rating and/or alone, after which inside a regression additively. There was many issues incorrect with the specification, however you solely may know that when you already knew what you have been speaking about
The LLM had most likely confidently given him that code and an evidence behind it, which he’d then used. Shortly after he wrote me again and mentioned I used to be proper.
The purpose I’m making is straightforward, and I’m not the primary to say it. When you realize your area, the AI agent is like a rocket strapped to your again. You fly quick and in a straighter line on the targets. You would possibly as properly be teleporting there too. The issues I can do now in a couple of hours would have taken me days or perhaps weeks earlier than. Claude handles the tedious elements — the LaTeX formatting, the file administration, the boilerplate code — whereas I concentrate on whether or not the analysis design is true. It’s genuinely transformative.
I believe the thinnest of ice actually comes once you don’t know the area very properly and also you’re utilizing AI to show it to you throughout the precise coding of the undertaking itself. I believe that works typically very properly, however there are situations in artistic superior work the place in case you are actually making an attempt to do that with virtually no precise background in the subject material, then I believe it will probably go off the rails quick and also you by no means know. Not essentially doomed — however in actual hassle. As a result of the AI will do issues shortly and confidently, and also you received’t have the vocabulary to interrogate it. You received’t actually see the very particular issues. With CS, it’s normally these little particulars that I simply have discovered to note — I do know when two estimators output ought to look almost an identical, and once they shouldn’t. So instantly once they don’t, even when there’s a snow drift of knowledge I’ve been getting, simply that one truth is sufficient and I can filter out the remainder and get on it.
The issue I believe is that you simply’ll get output that appears skilled. And possibly even worse, Claude will hammer at code till that code runs. If I’m incorrect, my code normally breaks down and in getting it to run, I really was profitable as a result of I discovered. However right here, the completion of duties don’t actually rely on me, and you will get code to run and but the calculations it’s doing be fully incorrect, and neither you nor it is aware of that day.
So all of that’s to say I believe we’re not but at AGI. We’re at one thing else, and I really like the place it’s, and it’s fully remodeled my life each personally and professionally. I’m completely insecure in regards to the future, like most everybody else, however I additionally am excited and glad to be a part of it. However I nonetheless assume, all mentioned and executed, that the place I’ve seen actually cool issues is in areas the place I’ve already established actual experience. And so I nonetheless fear on a regular basis — am I going to be sooner or later with out the power to identify these sorts of issues as a result of I depend on him to do it? Similar to bodily capital depreciates, so does human capital — and possibly even quicker.
This isn’t a blast in opposition to AI although. That genie is out of the bottle. We are going to by no means return to the best way it was. Our work shall be infinitely higher going ahead. The variety of papers that fail to copy is more likely to collapse right into a small dot given the sheer quantity of eyes that’ll be on it. The knowledge of AI agent crowds is coming. However I nonetheless assume we’ve got to be vigilant about defending and sustaining our human capital — not due to some allegiance to humanity. I simply don’t assume these applied sciences work greatest if you find yourself actually probably the most uninformed model of your self you might be.
