It’s onerous to consider that it’s April, I’ve been utilizing Claude Code since mid-November, feeling like I’m forward of the curve, after which feeling it’s already time to dedicate a strong week to retooling however that’s the way it feels now when I’ve discovered Paul Goldsmith-Pinkham’s ongoing video collection at Markus Academy. And I needed to take this chance to recommend that you just watch it, because it’s free, Paul is a superb communicator and has nice presence, he is a superb author and so the substack has quite a lot of thoughtfulness in how he interprets the video collection into essays, and the content material covers the newbie content material all the way in which to a way more superior set of content material. So let me share this with you now as the aim of my Claude Code collection has been, all alongside, to assist individuals acquire some expertise and perception into utilizing Claude Code by essays, code, and video stroll throughs, and that ought to subsequently embrace pointing individuals to assets wherever potential.
Not everybody is aware of who Paul is, so I’ll simply share who I see Paul to be, given I have no idea him nicely, however have interacted him considerably on-line.
Paul Goldsmith-Pinkham for many who do now know is an utilized econometrician specializing in subjects in finance and causal inference. He’s the creator of a number of extremely impactful papers in causal inference, which is how I personally know of him, like his work with Isaac Sorkin and Henry Swift on Bartik devices or “shift-share IV” revealed in American Financial Overview a few years in the past. He additionally wrote one other paper within the AER with Peter Hull and Michal Kolesár on the interpretation of odd least squares regressions in causal inference when there are a number of remedy classes. However he has additionally written a fantastic deal about shopper finance, which I don’t know a lot about, so I received’t try to say which ones are extra vital than others. I’ll simply say that from a distance I’ve been in a position to discern that they’re usually very impactful papers in that area, and that he is without doubt one of the extra competent and revered younger students working at present, at the very least in economics, and I don’t assume that’s in all probability a controversial opinion. If something, it’s most definitely understated. And he has this text, A Causal Affair, which since Claude Code appeared, Paul has been extra engaged in. He has additionally been extra engaged in it because the #EconTwitter period has waned.
One of many issues that Paul can be excellent at is broadly talking as a programmer. I say programmer, and never merely a coder, despite the fact that coding is the conventional approach of describing an excellent programmer as a result of I feel Paul is borderline a pc scientist. I’ve seen that amongst millenials, there are much more proper tail excellent programmers than I appeared to have seen in my Gen X era of utilized microeconomists, too. Increasingly more, you discover younger students that got here out within the final, say, dozen years which are by some means centaurs in terms of fashionable scholarship. They’re wonderful social scientists with, as they are saying today, “good style”. That’s, they acknowledge good concepts with good upside. They declare LLMs can not (but) mimic the very best people in that regard, and if that’s true, I might say in all probability Paul suits in that rarified firm have wonderful style.
In addition they come out with extremely excessive ranges of sophistication at broadly utilized statistics to the purpose that they’re in all probability statisticians themselves even when that’s low key now essentially what they seem like. How else are you able to clarify the kind of work he has performed constantly in causal inference? And over time, whenever you watch Paul present up constantly as coauthor with such individuals as Guido Imbens, Will Dobbie, Crystal Yang, Peter Hull, Michael Kolesár, Isaac Sorkin simply to call just a few, you possibly can type of begin to discern what it’s possible you’ll name the “Paul fastened impact” as a result of it’ll constantly be the case that whether or not he’s working with established econometricians, like Guido, or a bunch of utilized people, these papers he’s on present constantly a really excessive stage of econometric creativity and thoroughness, often doing greater than merely utilizing the strategy, and fairly often extending it, and even inserting it on a lot stronger footing. I feel how else are you able to interpret one thing just like the contamination bias in linear regression paper, since you would assume after a pair hundred years since Gauss’s first modern writing out of the minimizing of sum squared residuals process performed with a purpose to monitor a comet behind the solar as a wee little teenager within the late 1700s, that we might be performed attempting to crack open the regression to determine what it does, however then Paul and his crew comes alongside and we notice that that’s not the case, and that there’s extra to be taught.
In order that’s a full blown empirically oriented social scientist with a full blown depth of an econometrician and statistician.
However then there may be, and that is the half that I see as much more modern than ever earlier than, the pc scientist for lack of a greater phrase in his ability set. I don’t assume we affiliate that ability with the historical past of economics. I don’t assume we might have checked out Coase, Milton Friedman, and even essentially Imbens, Angrist, Orley, possibly even Heckman too, that they weren’t simply econometricians, however had a mastery of contemporary pc {hardware}, infrastructure, structure, and so forth the way in which you see an increasing number of among the many youthful cohort.
He isn’t alone in being like that. You usually see the tech corporations gobbling up individuals like Paul, although. Grant McDermott is one other individual like Paul in some respects, who was as soon as a tenure monitor assistant professor at Oregon however is now a principal scientist at Amazon, although I feel it’s truthful to say Grant is just not an econometrician like Paul is. Kyle Butts, my buddy who runs Mixtape Periods, would maybe be a really comparable individual to Paul within the spherical about, holistic approach. Pedro Sant’Anna is one other such individual. Andrew Baker is one other such individual, Brantly Callaway is one other such individual. However discover that these individuals are millennial aged, for probably the most half, and whenever you’re an previous man like me, you can begin to sense the fault traces a bit extra as a result of there’s simply extra mass on these mixture of social scientific style, pioneering econometrics, and pc science.
I might in all probability add that I see quite a lot of this in Jeff Smith too, who’s older than me and a Heckman scholar, and even my advisor Christopher Cornwell, Christopher Baum at Boston Faculty, and others too so it’s not as if this didn’t and has not all the time existed, notably amongst econometricians. They’ve constantly been those who’ve been wonderful at pc science instruments, in addition to style and econometric idea. So possibly in saying this about millenials, I’m simply talking anecdotally, although I do sense that there’s for some purpose extra utilized econometricians who’re sensible centaurs in that they simply might cross as pc scientists in the event that they needed to, and Paul is such an individual.
And subsequently it has not shocked me to see Paul transfer to the frontier of experience at using AI brokers, and that’s largely solely detectable in any respect as a result of Paul has chosen to proceed to be a public educator, so to talk, about serving to others by instructing, shared by movies and on-line writings. And also you noticed that right here too in a instructing collection he did on his web site the place he shared his instructing supplies in econometrics, but additionally someplace (can’t discover it proper now) a video collection he did throughout Covid really instructing this materials as nicely.
In order that’s sufficient background I feel, despite the fact that it’s a thumbnail sketch.
Paul has been doing a collection for Markus Academy on Claude Code aimed toward newbie to intermediate. Which is to say that Paul has been doing a Claude Code collection for the curious who’re desperately dedicated to retooling as shortly and deeply as they presumably can. Markus Academy is a substack by Markus Brunnermeier from Princeton, and it hosts conversations with lecturers and policymakers on quite a lot of subjects, which incorporates synthetic intelligence, and I like to recommend that substack as nicely.
Markus thus just lately invited Paul onto his Academy collection to speak about Claude Code. “Discuss” is just not fairly the correct phrase, although, as that makes it sound like a spherical desk dialog, like possibly an economist model of Scorching Ones (which admittedly can be an incredible present).
And there are for certain components of that, although toned down, as Markus does with Paul play the a part of the curious and economist wanting to be taught extra about Claude Code and AI Brokers, however for probably the most half it’s Paul working a web-based class on Claude Code.
It is rather onerous to do one thing like this tbh. It requires quite a lot of forethought as a result of in actuality, this software program, if that’s what it’s, is each straightforward to be taught and tough to speak in a regular strategy to others since quite a lot of it’s is simply talking in plain English by textual content prompts into the Terminal command line interface or desktop app in entrance of different individuals. I’ve performed it just a few occasions, each on right here, and in public, and really shortly the speak can run over in time. I just lately spoke to the Federal Reserve’s Board of Governors, for example, on Claude Code. I used to be allotted 60 minutes to speak about it, and went for 90 minutes, and whereas that’s par for course for me in some respects, it feels far more difficult to essentially do a excessive stage virtually guided class on Claude Code in a regular format than many different issues. Plus it’s actually not precisely clear simply what it is best to assume in regards to the viewers, the place to begin, and the place to take individuals.
And that is the place Paul has actually shined as a result of on his substack, he’s proven that he is aware of precisely the place to begin, the best way to be modern, and the best way to educate. He wrote a really fascinating Substack not too way back, for example, suggesting that utilized social scientists ought to accompany their papers with a translated markdown he referred to as LLM.txt or possibly it was LLMs.md. Both approach, it was a standardized, formatted various working paper, so to talk, that was designed explicitly for big language fashions to learn since they usually can not, although they do a fantastic job for not being nice at, parse constantly pdfs (notably figures since they’ve poor spatial reasoning) practically in addition to easy textual content information. However even then the knowledge that may be finest for them studying out of your work is probably going not the precise saved content material of a human-produced-academic-work-for-humans-to-read manuscript. Giant Language Fashions desire to learn textual content information, and structured in a specific approach, and so Paul translated historic pc scientists writings about that in the direction of utilized employees right here.
However within the Markus Academy collection, which is presently four-part and doubtless extra coming it appears like, he additionally has each recorded himself doing video stroll throughs with Markus (and the assumed viewers watching) as nicely written up wonderful summaries that stand alone on his substack that begins on the proper place. And the primary one is the “Getting Began with Claude Code: A Researcher’s Setup Information”.
One other fascinating factor he did, which first caught my consideration, was centered on security. Significantly constructing what known as a container inside your machine that can can help you safely experiment with Claude Code with out breaking your pc. Whereas I’ve been fairly reckless with my very own experiments, kind of working face first into the wall repeatedly to attempt to be taught what AI Brokers can do for me, Paul being extra of an precise bona fide pc scientist than me, and doubtless extra cautious too, has tended to see higher the recent spots and landmines, but additionally the alternatives to do issues effectively and nicely instantly, and worse that basically good piece, full with a repository, to assist those that are additionally not keen to, as my so-called “buddy” Andrew Baker likes to remind me of my very own quite a few mishaps, “bash their pc into oblivion” or some variation of that leveled at me. (I come from the scientific custom that doing no hurt, although, requires volunteering to do hurt to oneself to determine if one thing works although Baker would in all probability see this extra as me being me and never principled).
Anyway again to Paul, although.
Discover that one of many issues that Paul is doing on this collection is just not explaining Claude Code to and for engineers, however slightly explaining Claude Code to and for utilized people. Utilized which means the forms of social scientists who stay in folders and directories and run regressions on spreadsheets of numbers. That’s not the one sort of utilized people, and never even the one sort of empiricist, which is why I qualify it. And on this first video and substack, Paul’s speak may be very a lot centered on the newbie who’s timid however desires to get issues up and working.
The second video is a regular one. It’s titled “From an Empty Folder to a Determine utilizing Claude Code”. Paul, I feel like me too early on in my collection, acknowledges that one of many issues that characterizes fashionable analysis is the folder. In case your work exists in a number of folders and several other information in your pc, then we are able to name it analysis, and subsequently Half 2 in his collection is for you. In case your work doesn’t exist in a number of folders and several other information in your pc, then it’s in all probability not the kind of analysis that Paul’s centered on serving to you with, and subsequently it’s possible you’ll need to skip this one.
The concept of beginning with the empty folder, after which making a determine, instantly will get to what I see as the true app killer, although, for Claude Code and practitioners. If you wish to actually captivate hearts and minds, the truth is, it is best to have Claude Code really handle your folders solely. And in case you are actually courageous, you’ll have him utterly rearrange your folders — that’s should you’re actually courageous. I’ve a mission with 2,000 information and 14g gigs proper now that I’m utilizing Claude Code intensively on that I nonetheless am low key humiliated and fearful he’ll screw it up much more which always makes me assume I could also be teetering proper on the sting of changing into a hoarder.
However put that apart — this concept of utilizing Claude Code to make “stunning figures” is absolutely on the coronary heart of what I see as one thing that folks ought to take critically. Why? As a result of figures are on the coronary heart of the trendy period of social scientific analysis as a result of knowledge visualization is on the coronary heart of social scientific analysis as a result of knowledge is on the coronary heart of social scientific analysis as a result of utilized statistics is on the coronary heart of social scientific analysis as a result of computer systems are on the coronary heart of social scientific analysis. Paul has all the time had a superb eye for making “stunning figures” and you may inform as a result of he was an early, enthusiastic shopper of Kieran Healy’s wonderful e book on knowledge visualization. And within the video collection on Markus’s substack, Paul really at the very least as soon as requested Claude Code to make a determine like how Kieran Healy makes them. Healy is kind of the substitute of Edward Tufte in some ways for the info visualization of quantification.
Me and Caitlin Myers in our podcast do that too. We’ve got Claude Code make “stunning figures” within the datasets that we’ve been working with, and so they constantly astonish us. Even Caitlin who’s borderline the Michaelangelo of knowledge visualization for pushing onerous on what she sees because the rhetoric of images has been astonished with what Claude is able to. See right here this half the place she sees for the primary time Claude Code’s rendition of a wedding collection we had created.
So instructing a category the place the primary rhetorical punch is to transition from the empty folder to the manufacturing of a determine, which clearly has in between Claude accumulating knowledge and populating it within the listing, is a good concept. I do it too as a result of if individuals see that, they are going to be impressed, and I feel it’s an vital factor as a result of as I used to be saying earlier than I misplaced my prepare of thought, making stunning figures is:
And so Claude Code being able to make us all turn into actually good at that I feel is without doubt one of the actually invaluable issues it affords as a present to the neighborhood. (One other being simply making excellent decks, which is one other factor I emphasize loads on right here, together with my “rhetoric of decks” refrains).
However then his third video and substack is the place he actually centered on making a structured database from what I feel is a typical dataset for these in finance, EDGAR Filings. That is the place he exhibits “text-as-data”, which I suppose I for some purpose need to level out has been one of many first issues I used Claude Code for too, each in private analysis (on a giant scraping mission I did all of December and January this yr) and on right here in a collection of movies displaying the best way to analyze Congressional speeches and having them categorized at OpenAI utilizing gpt-4o-mini.
Extra particularly, Paul exhibits us him making a analysis pipeline solely by Claude Code that scrapes SEC EDGAR filings, extracts the Danger Elements part (Merchandise 1A) from 10-Ok annual experiences for about 30 trade-exposed corporations, and organizes the whole lot right into a structured DuckDB database. The motivating query is, as I’ve been saying, solely empirical too: did corporations change their formal danger disclosures in response to the 2025 tariff escalation? However slightly than working with knowledge that’s already clear and tabular, he’s displaying the best way to go from “the knowledge exists someplace on the web” to a queryable, joinable analysis dataset, which he argues is often the toughest a part of any text-as-data mission.
What makes the publish fascinating methodologically is how Paul walks us by a course of of getting Claude Code deal with the messy, real-world elements of this process. It enters plan mode earlier than writing any code, asks Paul clarifying questions on design choices (database format, key phrase strategy, authentication headers), after which builds a 480-line Python pipeline with caching, error logging, and extraction high quality experiences in-built. When issues go incorrect, like a mismatched ticker for Hole Inc., a regex failing on Honeywell’s formatting, Paul exhibits that Claude investigates it, fixes the difficulty, and re-runs solely the affected information slightly than beginning over. The pipeline efficiently extracts Merchandise 1A from 119 out of 120 filings. And that is all performed stay with Markus, and defined nicely within the substack.
The payoff is a real descriptive discovering: tariff-related language in 10-Ok filings elevated considerably from 2022 to 2025, the vocabulary shifted from slender commerce phrases to broader policy-risk language (”commerce struggle,” “liberation day”), and corporations like Walmart didn’t point out tariffs in any respect till 2025. Paul’s broader pedagogical level is about workflow philosophy — the database is the deliverable, not the uncooked HTML information, and constructing it by Claude Code took about half-hour of interactive back-and-forth slightly than days of handbook scripting.
It looks as if it was solely a dozen years in the past that the phrase “huge knowledge” was new and intimidating, whereas now it’s commonplace and anticipated. And but dealing with really huge knowledge sources stays one thing that hardly ever are economists and social scientists really competent at — aside from, as I stated, these proper tail centaur social scientists/econometricians/pc scientists, like Paul (and possibly you!). And in at present’s publish, Paul walks us by this arguing repeatedly that the marginal price of doing this nicely has collapse to zero because of Claude Code, representing in all probability one of the vital vital positive factors to fashionable analysis there may be.
The sort of train that he undertakes in entrance of his viewers (i.e., Markus) includes him constructing a clear, queryable database from 18 years of HMDA mortgage knowledge consisting of 291 million rows and roughly 70 GB of uncooked CSVs. And, hopefully this won’t cease thrilling us for some time, solely by Claude Code.
The motivating analysis query (“style”) is how the geographic footprint of fintech mortgage lenders has shifted throughout the US from 2007 to 2024, however as along with his third EDGAR publish I discussed, that analysis query is absolutely only a foil for demonstrating the workflow sample:
His central argument is that the fastened price of doing knowledge engineering correctly, which traditionally was one thing most economists averted as a result of they have been so dangerous at it, is now gone. We’re in some ways all Spartacus in terms of such seemingly harmful and unimaginable issues.
The technical substance of the publish is spectacular due to how Paul exhibits us Claude dealing with a genuinely onerous drawback which is that HMDA modified its whole column naming scheme and identifier system in 2018, so pre- and post-2018 information are basically completely different datasets. That is really one thing that Caitlin and I seen in our personal on-screen discovery that Texas’s marriage certificates appeared to instantly shift in 2018 as marriage certificates in a single yr plummeted from round 200,000 new marriage certificates filed to round 120,000, and a spot widened between our knowledge and different knowledge sources that we had requested Claude to seek out to examine round knowledge high quality.
So this concept that Claude Code can discover after which attempt to work by fixing such thorny issues, each nicely and quick, is a should see for anybody studying to make use of Claude Code for the primary time exactly due to what I stated earlier:
-
it’s a excessive valued process
-
it is extremely time consuming
-
it’s straightforward to do it extraordinarily badly — which means incorretly
-
and the sanctions for failure could be excessive, possibly even profession ending, if not caught quickly
Nevertheless it additionally navigates a 15x compression acquire by changing CSVs to parquet (70 GB down to six GB), builds mixture county-year tables with HHI and denial charges over 291 million rows in seconds utilizing DuckDB, and classifies lenders as fintech versus conventional by extending a taxonomy Paul is aware of from a paper ahead by 2024. Paul’s key conceptual contribution, although, is what he describes the metadata desk as to Markus. He describes the metadata desk as a self-documenting desk contained in the DuckDB file that describes each column, its legitimate values, and its yr availability, so any future session (or coauthor) can instantly perceive the dataset with out re-explanation.
That is both refined however not vital or it’s refined and essential, however it’s this bizarre theme I’ve been choosing up on with Paul’s writings anyway the place he brings consideration to a element like that which is that labels are knowledge. It looks as if he is aware of one thing about text-as-data that’s well-known amongst those that work with text-as-data however which isn’t well-known amongst utilized people. And so since I feel we’re going to see an explosion of labor on text-as-data going ahead as a result of AI Brokers collapsing the fastened and variable prices of working with it to zero, and the extraordinarily excessive valued issues in a pure language processing past mere regex, such issues Paul is targeted on listed here are in all probability new sufficient that making it salient is absolutely invaluable.
However once more, there may be the image. The “stunning determine” as I wish to say to my Claude. Not solely is the attractive determine a lovely determine, however it additionally tells a clear empirical story: that the fintech share of mortgage originations rose from about 1% in 2007 to a peak of 16% in the course of the COVID refinancing increase in 2021, whereas the normal financial institution share fell from a majority to beneath 40% by 2024 which extends and confirms a discovering from different work in regards to the post-COVID price cycle.
I wrote this publish as a result of I feel if in case you have been following my Claude Code collection, that given I really feel a certain quantity of duty to level readers to the very best assets you’ll find on the market on utilizing Claude Code for empirical analysis, that it is best to hit the subscribe and observe button on Paul’s work right here. Doing these sorts of public instructing issues are all the time labors of affection. In some ways in which means the wages of that’s the love itself. And I feel Paul is like that in that he’s good at it as a result of he loves doing it and he loves doing it as a result of he’s good at being a professor and has the “style” related to that in that he is aware of the way in which to do it and never do it. And this collection is an instance of that.
So contemplate following him. Think about subscribing to him. And contemplate changing into a paying subscriber as nicely. I feel the work he’s doing is sweet for the neighborhood.