Monday, May 11, 2026

What I am studying from giving 4 AI talks in two weeks


The final two weeks have been an actual whirlwind. I offered at Harvard Medical Faculty early final week, gave the keynote at Georgetown’s McCourt Faculty of Public Coverage school retreat final Thursday, spoke on the Harvard Kennedy Faculty earlier this week, and yesterday I used to be a visitor speaker in Alberto Abadie’s “Subjects in Econometrics” class at MIT. 4 talks, 4 completely different audiences. There have been well being oriented scientists, theologians, philosophers, political scientists and economists of an enormous vary, to not point out many school and plenty of college students. And roughly the identical underlying materials every time: what AI is doing to the probabilities of analysis observe of empirical analysis.

However I don’t wish to use this put up to recap the talks. I wish to use it to write down down what I’m studying from giving them, as a result of essentially the most attention-grabbing factor about doing this many of those in a row is that the audiences maintain instructing me what they should see, and what they should see retains altering my sense of what the demo is definitely about.

For the Kennedy Faculty speak I constructed the demo round a paper by Michael Kremer and Dan Levy entitled “Peer Results and Alcohol Use amongst School College students,” revealed within the Journal of Financial Views in 2008. Let me first inform you in regards to the paper earlier than I inform you in regards to the theatrical stunt that I used the paper for.

The setup within the paper is gorgeous. A big Midwestern college ran a housing lottery that randomly assigned freshman roommates conditional on a couple of preferences. Some college students arrived having reported on a survey, earlier than school, that they drank in highschool. Others reported they didn’t.

The discovering is that male college students who bought randomly assigned a roommate who had drunk in highschool misplaced roughly 1 / 4 of a GPA level, which is about as a lot as a 50-point drop on the SAT. The impact was concentrated amongst college students who had been themselves already drinkers, and it was worst on the backside of the GPA distribution that are clearly the scholars who may least afford such adverse shocks.

And right here is the half that lands hardest: the impact grew bigger within the second yr, even after the roommate had moved out. Regardless of the mechanism was, it wasn’t noise or disruption, however relatively one thing like a sluggish shaping of a teenager’s preferences by the particular person they occurred to be sleeping ten toes away from. The college selected who you lived with, and that selection rippled by the remainder of your time there.

That may be a fantastic paper to make use of for a public speak as a result of the causal logic is clear sufficient to show in 5 minutes and the human stakes are apparent to anybody who has ever lived in a dorm.

Besides that I didn’t educate it. I didn’t say a lot in any respect about it.

Here’s what I really did at Kennedy. I had not ready slides. I confirmed those who the folder entitled “Kennedy” was really empty. It was roughly meant to be like when a magician asks a volunteer to evaluation a deck of playing cards and make sure that there’s nothing fishy with it for the group in order that the group may confirm that there was nothing fishy with the deck too. Even higher would’ve in all probability been to choose randomly on somebody within the crowd and simply ask them to offer me a paper — any paper — and I exploit it as an alternative.

I wish to be clear about that as a result of the entire level trusted it. I opened my laptop computer in entrance of the viewers, pulled up Claude Code in a terminal, and gave it one immediate typed reside, stuffed with typos as a result of I used to be speaking and typing on the similar time. I used the “- – dangerously-skip-permissions” modifier for claude from the CLI in order that this is able to all run within the background as I went by my speak, which was once more a part of the suspense that I hoped would maintain the median viewers member engaged as they patiently waited for the end result. Right here was the immediate:

SO i'm at the moment giving a chat on this paper on the kennedy college, and i want a few favors. primary i need you to make use of my /split-pdf and break up the pdf into as many 4-page pdfs as you may after which write a abstract per break up fow hat you discovered what t is about, why we shoudl care who the viewers and many others. after which when you ahve all these markedown summaries, take one other agent and summarize that right into a single huge throgouh markdown. Then, i need you to learn my rhetor ic of decks essays and myaristotean essay in regards to the rhteroci of beamer slides after which use my /beautiful_deck talent to make a fantastic deck in beamer that's aesthetically very fairly, and is meant for an audiecnc eof clever layperople with out a background in alcoho coverage and school peer results. this deck might want to transatel all f ther regression tables into image utilizing R and .png and so they should be correct and really attention-grabbing. in order that menas you’re going need to simulate the outcomes that dan had and it acturally match the true outcomes. and make it 20 slides. make it fairly. have a narrative. lead with narrative, lead with info. and make it cool

After which I hit enter, let it sit there a second so they might see it get fired up, after which I went to my speak. I didn’t return for an additional 15-Half-hour I feel.

Within the background, sub-agents spinning up. I had invoked my /split-pdf talent inflicting the Levy and Kremer (2008) PDF to be break up into N pdfs of equal size (4-page pdfs). Markdown notes (for every of the N pdf splits) had been written, with one further markdown that upon completion swept by the N markdowns and wrote one huge markdown abstract. R scripts had been drafted, then run, then debugged, all whereas I spoke. LaTeX beamer code was written, compiled, overfull/overfill/hbox/vbox errors had been recognized, fastened, then compiled once more — a course of that was repeated till there have been no extra compile errors, not even so-called small beauty ones. In my expertise, they’re by no means small, so I tolerate zero of them. A customized shade palette was chosen. A frame-title fashion was designed. A visible audit move ran over the figures in search of label collisions. After which 15-Half-hour later there was a “lovely deck”. We opened it. It was not visually good, as a result of I nonetheless haven’t perfected my talent to eradicate all of the tikz and .png associated visible errors, however nonetheless, it was fairly good for somebody who was extracting coefficients and normal errors and confidence intervals and creating their very own information visualization primarily based on them.

Right here is the deck on your perusal too.

Share Scott’s Mixtape Substack

You’ve gotten in all probability seen the opposite parlor trick with AI the place somebody in entrance of you has Claude Code write a completely autonomous program analysis manuscript, however I feel what I did is extra impactful on a skeptical viewers, and I wish to now clarify why I really feel this fashion.

There’s plenty of, let’s name, “Luddite” anxiousness across the automation of cognitive duties utilizing AI Brokers. Luddite within the historic sense — deep ethical opposition to machines due to the job stealing nature of them. Insofar as AI can automate analysis, begin to end, then what’s the function of a PhD? What’s the function of the human? Do you even want a PhD to push the buttons? Are we overpaying, at that time, the button pushers?

So I feel there’s a temptation, once you’re making an attempt to indicate individuals what AI can do for empirical work, to indicate it doing the factor they fear about most: writing a paper. Which I feel is a hazard as a result of when they’re morally against that performance, they’ll psychologically turn out to be engaged with battle, flight or freeze responses, none of which is fascinating from my perspective. Plus, even when it did write autonomously the manuscript in entrance of them, the opposite situation the place the standard of it can’t be assessed, to not point out the challenges of presenting it to a big crowd. You can not, in different phrases, simply present them a manuscript, so who is aware of what’s in there.

However you may present them a fantastic deck, and I don’t assume decks set off the Luddite repugnance the way in which manuscripts do. For one, we share decks already. Publishers ship us pattern decks (they don’t ship us permission to plagiarize the textbook although each are cognitive output). They’re very beneficial, extremely time intensive, and are continuously very unhealthy. They take up plenty of a professors’ time, they’re considered one of two modes (the opposite being the manuscript) describing the way in which researchers speak to 1 one other. So instantly they’re acknowledged as truthful floor, or moreso at the very least, and they are often instantly verified. Plus, my immediate was very particular — discover how a lot course I had over it.

Because of this I felt that the Kennedy demo labored higher than a model of “watch Claude write a paper”. It isn’t much less bold to write down a deck, however I feel it’s important to at the very least considerably acknowledge that recreating regression tables as simulated figures once you don’t have the precise micro information and but are instructed to simulate it utilizing R from the revealed coefficients is, on a technical stage, a wildly arduous job, if not a seemingly unattainable job, and arguably a extra demanding one than writing a bunch of phrases. It’s simpler for the viewers to obtain, exactly as a result of they’re not being requested to render a verdict on a analysis artifact. Slightly, they’re being requested to look at a presentation come into existence in twenty minutes, after which watch a human — me — rise up and provides it. And for the reason that task was to transform regression coefficients and normal errors into one thing visible, they might additionally see that this was not some wood factor. Slightly it was requiring discretion and decisions, and notably, discretion and decisions inside the rhetoric of decks constraints. What, in different phrases, would work effectively for an viewers? That was implicit within the task.

I feel that is the appropriate register for lots of AI demos. Discover the type of work the place the AI is the medium and the human is the performer. Present that. Not “look what it wrote with out me.” Present “look what we did collectively, in entrance of you, in actual time.”

The opposite factor I’m studying from this run of talks is extra sensible, and I owe it to a PhD pupil named Theo (Hey Theo!) who got here as much as me after the MIT class yesterday.

Dan Levy emailed me immediately and mentioned he had spoken with Michael Kremer, the coauthor on the paper, and instructed him in regards to the Kennedy demo and that Kremer was very intrigued. He questioned if he may see the deck and a recording of the seminar, and I fortunately mentioned sure.

However Dan additionally requested if I may ship him the immediate I’d typed. I sat down to write down the e-mail and realized, with some sinking feeling, that I had typed that immediate instantly into the terminal utilizing CLI throughout a reside speak, two days earlier, after which closed the session. So presumably it was gone.

However then I remembered one thing the MIT PhD pupil instructed me yesterday. After my speak, Theo instructed me that it was essentially obligatory to continually maintain progress logs as a result of Claude already retains them in JSON information on my laptop computer. Once I began asking extra about that, he smiled and mentioned what I instantly knew myself — simply ask Claude and he’ll inform me .

So I did. I requested Claude if he may look across the Kennedy folder and discover the immediate, that I heard it was in some hidden .claude or ~/one thing factor, based on Idea.

And Theo was proper. It was there. My whole session with Claude at Kennedy was sitting in a JSON file.

Which signifies that each session I’ve ever run with Claude Code can be sitting on my laptop computer. Each immediate I’ve ever typed. Each instrument name Claude has ever made on my behalf. Each sub-agent it has spawned. Each file it has learn or written. Each Bash command, each R script, each LaTeX compile, each error, each retry, each edit.

So once I had him evaluation the Kennedy folder and the JSON file, I discovered that the Kennedy session was 445 messages lengthy and three.7 megabytes of structured textual content. My unique immediate was the very first line, typos and all, faithfully preserved.

I used to be in a position to ship Dan the verbatim unique. I used to be additionally in a position to ship him your entire working folder, which you’ll be able to see right here, with the supply paper, Claude’s studying notes, the design doc it wrote earlier than any code, the seven R scripts that produced the figures, the LaTeX supply, and the total session transcript. The entire thing. Reproducible. Auditable. Out there to him and Michael and any AI agent they wish to level at it.

This issues greater than it seems like. One of many quiet anxieties about AI in analysis is that the work occurs in a black field, that if I ask the mannequin to do one thing and it does it, and I’ve a outcome, that doesn’t due to this fact imply that I’ve a file of how the outcome was produced.

However that’s not really true. The file exists. It’s sitting on my machine. I simply need to know the place to look, and that no one had instructed me, and now I’m telling you. The trail on macOS, in case you wish to have a look at your personal:

~/.claude/tasks//.jsonl

Open one. It’s a startling factor to learn your personal session again from the skin. You see the agent considering, the sub-agents reporting again, the dead-ends, the corrections, the second a determine was lastly rendered accurately.

You would possibly even have the ability to detect issues like whether or not the agent was p-hacking due to this diary it stored.

It’s the closest factor I’ve ever seen to a flight recorder for thought.

4 talks, two weeks, 4 completely different audiences. What I maintain noticing, throughout all of them, is that the model of the demo that lands is the model the place the AI is doing one thing that seems unattainable proper up till the second it really works. Writing a paper doesn’t look unattainable to a analysis viewers as a result of they’ve learn sufficient mediocre papers to know that the bar is low. And it takes greater than a few minutes to judge the standard of a paper, and it’s fairly straightforward to dismiss one if you’re considerably biased towards AI automated analysis anyway. So I don’t suggest that when you find yourself making an attempt to indicate individuals about AI brokers. I feel, as an alternative, think about the deck — the gorgeous deck.

Constructing a fantastic, unique, narratively-arced presentation a couple of stranger’s paper, with simulated figures trustworthy to the unique tables, within the twenty minutes earlier than the speak begins seems and appears like an unattainable magic trick. They’re going to assume the cube are loaded or that you’ve an ace up your sleeve as a result of that’s the response to magic tips. Why? As a result of when a magic trick is completed effectively, your mind can not course of it, and due to this fact you attain for a lot of explanations.

Properly, that assumes that the trick landed. And that’s the reason you do this type of “lovely deck” demo — as a result of it, in contrast to an automatic paper, will land. It lands as a result of they will from their seats see the success or lack thereof. And when it’s finished on a well-known revealed research, then additionally they can instantly confirm its accuracy or not.

It’s the type of job the place the viewers leans ahead not as a result of they’re skeptical however as a result of they genuinely can not think about how the subsequent step will occur, after which it occurs.

So that’s the demo I’m going to maintain working. Not “the AI wrote a paper.” Slightly, the “watch this factor come into existence in entrance of you, after which watch me give the speak it simply made.” The AI is the medium. I’m the rhetor. The work is shared. And, because it seems, each single keystroke is preserved.

Related Articles

Latest Articles