Normally each Monday I replace about Codechella Madrid, the third annual workshop on difference-in-differences and panel strategies I do in Madrid with CUNEF and a few associates. However I’m ready on suggestions from one thing that I needed to verify on earlier than doing one other official replace. However please enroll! And when you’re eligible for a pupil or post-doc low cost (and their equal!), please do electronic mail me at causalinf@mixtape.consulting. Within the meantime, right here’s what at the moment’s substack is about.
Right here’s the video stroll by means of of what I did at the moment. It’s once more about Claude Code. (BTW, these thumb nails are loopy photos! I don’t know learn how to not do it that means. I ought to ask Claude Code).
This is the twenty third installment in my collection on working with Claude Code. Wow! The earlier posts are collected [here]. If you need to strive /insights your self, it’s out there in Claude Code at the moment. Thanks for supporting the substack! It is a labor of affection, so please think about changing into a paying subscriber. For under a cup of espresso ($5/month), you can also have full entry to the whole lot starting from difference-in-differences explainers, causal inference dialogue, Claude Code updates, and all kinds of random stuff through the years. I believe there’s round 700-800 posts on this factor!
I Spent This Morning Asking Claude to Analyze Me
This morning, I ran a command referred to as /insights and it learn by means of 73 of my Claude Code classes — six weeks of labor — and produced a portrait. Not a report. A portrait. It advised me what number of messages I’d despatched (585), what number of strains Claude had written for me (44,486), what I work on (5 domains, from econometrics to course growth), and after I work (evenings, largely, within the margins of my day).
Then it advised me what I’m good at and what retains going flawed. After which it made suggestions of learn how to enhance. However as all the time, I needed the knowledge in deck format, and as I began engaged on the deck, I slowly slipped into the method of making abilities versus creating instructions, and a means of doing that type of opened up, which I’ll attempt to clarify, however which you’ll see for your self within the video walk-through.
Utilizing /Insights As a Wellness Checkup / Myers-Briggs Persona Take a look at
Earlier than diving into what occurred, let me clarify my philosophy. I truly am not within the camp that the purpose is to obtain different folks’s /abilities. That’s one thing that Ethan Mollick stated on Linkedin too, which type of confirmed at the very least that I’m not the one one. I’m firmly within the camp that you ought to be attempting to make use of Claude Code intensively, repeatedly, on precise routine tasks like “analysis” or “educating”, after which use /insights (a slash command you kind from the Claude Code immediate when you execute it within the terminal — see my video for what I imply if that’s new for you). You let Claude Code, in different phrases, analyze your use, after which generate one thing like a psych profile of how you’re employed, what your comparative benefit is, what you’re doing nice and what you have to work on. And that to me goes to have the upper stage of success as a result of not everybody must be good at the whole lot.
That is the precept of comparative benefit and that’s the angle to take, for my part, with the creation of /abilities and /instructions. You need to determine your individual type, after which create issues for you based mostly on that self-understanding. That’s what I imply after I say use /insights prefer it’s a Wellness Checkup. Or possibly much more precisely is use it like a Myers-Briggs type character profile — let it strive to determine what you’re nice at and what you’re utilizing CC for in order that it could actually maximize there. After all we wish international max, not simply native max, however I believe that is nonetheless the highway to that.
The Deck Was the Level
However while you use /insights, you get an .html in considered one of your hidden folders, so I present within the video stroll by means of learn how to discover it by clicking command-shift-period. And whereas the html is okay, I’ve mainly poisoned my mind to solely need decks. It’s not so dissimilar to how I solely drink IPAs now. I doubt I may do a kind of stuff you see folks doing at bars the place they “drink the menu” by having a beer from your complete menu. All I need to do is drink IPAs. In the identical means, I appear to solely need decks — good or dangerous, I exploit decks to study extra of what I’ve executed, or the place I’m going, as decks are sequential flashcards that inform a narrative. So that you’ll see within the video stroll by means of not solely how I created a self diagnostic report utilizing /insights; I additionally created a deck utilizing my specific type of utilizing a command I’ve made referred to as /compiledeck and an in depth essay on my mixtapetools repository referred to as “rhetoric of decks”.
So, Claude did that —it turnd the insights html with all its knowledge right into a Beamer presentation — a correct slide deck, following the Rhetoric of Decks philosophy I’ve been growing, with lovely TikZ diagrams and customized matplotlib figures and nil compile warnings. It had zero compile warnings as a result of my compiledeck command (earlier than this session, it was solely a command — not a talent) insisted it verify and recheck overfull/overfill/hbox/vbox error messages till there weren’t any.
Apparently, /insights advised me that LaTeX shows have been considered one of my high two work areas (20 classes out of 73), which didn’t shock me for the explanations I simply listed. The decks aren’t a aspect product of my analysis. They are a significant a part of my analysis workflow. They’re like highlighting a manuscript, or taking notes in a journal, after studying a paper. They’re simply really easy to create and really easy to tinker with, and so I exploit them religiously, which I believe means I’m now shifting into a brand new psychological mannequin of the world — the deck — with out absolutely realizing it.
What the Portrait Revealed
So right here was roughly the diagnostics that /insights discovered. The insights evaluation recognized a sample it referred to as “formidable delegation with sharp correction.” I give Claude roughly 8 directions per session. I’m fairly certain every time I open a brand new terminal window, that counts as a brand new session, however I have to verify. Level is, a session doesn’t seem like a mission or a mission folder. And for me, Claude executes roughly 37 actions per session which it referred to as “a 4.6x multiplier”. My private, subjective type is to set path and delegate to Claude who does the legwork, after which — that is the essential half — I audit the output aggressively.
Aggressive auditing seems to be such a powerful a part of my very own private workflow that Claude flagged it. Twenty-seven instances throughout these 73 classes, Claude began down the flawed path. Twenty-three instances it misunderstood what I requested. Thirty-four instances the code had bugs. I haven’t absolutely automated them away; moderately, I’ve a workflow that inserts myself into “the pipeline” at key factors religiously which is how I catch this stuff. And generally the corruption is sharp. The 82% success charge wasn’t regardless of the corrections. It was due to them. And that I believe is the characteristic not the bug — I’m inserted into the verification system and that’s the reason my success charge is so excessive.
The Bézier Downside, or Why Spatial Consciousness Issues
However with out that means to, at the moment’s video session developed into what I used to be wanting to begin doing which is use the substack for example, not my abilities and instructions, however moderately, how I’m going about discovering which of them I have to make for myself as a way to see for your self the way you may do the identical form of self-reflection with /insights to create your individual options to your individual distinctive issues. There actually isn’t a one answer in different phrases — there are common rules that all of us need to observe, for certain. However there are additionally subjective ones. Not all of us need to take coronary heart medication to stay although all of us need to breathe air and drink water and get sufficient energy to stay. A physician prescribes each, however not everybody wants coronary heart treatment. However the wholesome residing protocol — that’s going to observe recognized organic rules, and given random fluctuations in a single’s personal biology, you’ll want to determine learn how to tweak it till you get to the place you want.
So like I used to be saying — with out actually that means to, the slim problem of making “the proper deck” grew to become the take a look at case for me to essentially shift my workflow and so that you’ll see that within the video stroll by means of.
With that stated, whereas constructing the deck to point out viewers on the substack what I had present in /insights, I saved discovering TikZ errors, however they weren’t the overfull/overfill ones as I had already created /compiledeck to double verify till these have been gone which is a part of my “zero error” philosophy, which I encourage you to cease too in your individual workflow. I don’t imply zero error in some meta-researcher philosophy although. I imply iterate till the workflow utilizing AI brokers is minimized to zero. Not minimized — minimized to zero. You could take the place that you’ll by no means tolerate a mistake. And the one option to by no means tolerate a mistake is to search out them, determine their causes, after which automate away what you’ll be able to, and confirm confirm confirm too. It’s each/and, not both/or.
The errors I saved discovering then weren’t the everyday beamer errors of overfull/overfill, however moderately they have been the much less straightforward to establish (for an LLM) the Tikz errors. Tikz errors don’t generate compile errors as a result of they’re in coordinate house. They contain, as an example, textual content labels sitting on high of arrows, packing containers overlapping packing containers, annotations bleeding into neighboring parts. And I saved asking Claude to repair them. And Claude saved lacking them. However regularly, we began updating the /compiledeck command to determine the method by which not simply this error was mounted, but in addition the information producing course of that created this error was shut down too.
Some have been fairly straightforward to repair, however not all of them. As an example, one label which learn “by way of the terminal” was positioned between two packing containers. It, and extra like them, survived three rounds of me saying “repair this” earlier than we found out what was truly flawed. The textual content was wider than the hole between the 2 packing containers. Claude was adjusting the vertical place (transferring it up, transferring it down) when the issue was horizontal. The textual content bodily didn’t match within the house. So we up to date the command markdown to elucidate a brand new means of figuring out that earlier than it occurred.
Tikz Errors and Bézier curves
However then there have been the curves which appeared, by means of repeated trials, to be one thing totally different than the earlier factor. These curve errors have been labels in house floating over an arrow, and oftentimes it was a curved arrow too. Upon repeatedly speaking to Claude about it, we discovered these have been referred to as Bézier curves they usually have been uniquely creating their very own sorts of errors that I may see however which Claude couldn’t see as a result of Claude doesn’t have eyeballs. The problem is fairly easy: TikZ enables you to draw curved arrows with a command like bend left=35, and people curves observe a mathematical path. However Claude wasn’t placing two and two collectively to trace the curve; it saved inserting labels within the path of those curves — textual content sitting proper the place the arrow sweeps by means of. I’d level it out, Claude would repair that one occasion, and the identical error would seem on one other slide.
This was once we began to inch in direction of updating /abilities and /instructions. We did one thing that turned out to be the best a part of your complete session. As a substitute of simply fixing every error, I requested: Claude, why did you miss this? No judgement — I used to be attempting to slam him. I used to be asking Claude to mirror on the causes of his personal failures as a result of possibly if he may see the reason for the failure, he may establish the DGP for that failure, and we may surgically go to that DGP and make it not simply cease for this one graphic, however for all graphics.
And Claude was sincere. It stated it couldn’t intuit the place a Bézier curve passes at a given level. It was eyeballing — estimating based mostly on instinct moderately than computing. And after I requested it to audit its personal work, it re-ran the identical flawed instinct and acquired the identical flawed reply.
However eyeballing is pointless with Bézier curves as these observe equations. The repair was a formulation. The utmost depth of a curved arrow is (chord / 2) × tan(bend_angle / 2). So as soon as we wrote that down, Claude may compute a quantity and evaluate it to the label’s place. It was in different phrases arithmetic, not spatial reasoning. And that’s an essential level as a result of LLMs are notoriously dangerous at spatial reasoning, so to assist it overcome that constraint, you want fixes which might be designed for that drawback.
And that led us to begin making a taxonomy of such spatial reasoning issues. As an example, we discovered one other class: arrows crossing arrows. Similar underlying problem — Claude couldn’t see {that a} curved return arrow would intersect a vertical department. It was a distinct formulation based mostly on the identical precept involving changing the spatial drawback right into a computational one.
However then we discovered nonetheless a third class: an annotation rectangle whose left edge prolonged into the neighboring field. This one was subtler. No formulation would catch it mechanically. You simply needed to discover that x=3.6 (the rectangle edge) was lower than x=3.8 (the field edge), that means they overlapped by 0.2cm. Claude had the numbers. It simply didn’t spontaneously compute the spatial implication.
I’ve learn that LLMs battle with spatial reasoning — the basic instance is chess, the place the mannequin is aware of the principles however can’t reliably observe the place items are after fifteen strikes. This is similar factor. Claude is aware of TikZ syntax completely. It simply can’t maintain a psychological map of the place the whole lot is on the slide.
W. Edward Deming and Zero Error Philosophy For Your Workflow
Right here is the final lesson that I need you to know, which I attempted to make obvious within the video stroll by means of. Don’t see errors and failures as dangerous issues. Somewhat, use them to make lemonade out of lemons. What I imply is, let these errors information you and Claude to diagnosing his personal causes for failing, the causes of his personal failures, and the let that discovery lead you to creating your individual abilities.
There’s a person named W. Edwards Deming — a statistician who went to Japan after the battle and taught them one thing that American producers had been ignoring. The core concept wasn’t difficult: each error is info. Don’t simply repair the defect. Discover out why the defect occurred, and alter the method so it could actually’t occur once more.
That’s what we did with the Bézier drawback. We didn’t simply transfer the label as soon as. Somewhat we wrote a formulation, put it in a reference doc, and restructured your complete TikZ verification workflow in order that curved arrows get checked first — earlier than the rest — utilizing arithmetic as a substitute of eyeballing. Then we discovered the arrow-crossing-arrow drawback and added that. Then the annotation-overlap drawback. Every failure grew to become a brand new rule.
By the tip of the session, the tikz_rules.md file had 9 guidelines and a five-pass verification workflow, organized not by kind of error however by order of operations: Bézier curves first (as a result of they’re essentially the most harmful and essentially the most systematic), then hole calculations, then label positioning, then the whole lot else, then open the PDF and look.
The file is a group of what I’d name prosthetic spatial reasoning. Every rule compensates for a selected blind spot within the AI’s capacity to purpose about the place issues are on a web page. And it’ll continue to grow. Each time I discover a new class of error, we’ll add a rule.
Expertise vs. Instructions, or Why the Container Issues
So, if we undertake this Deming-like philosophy of “zero errors”, it means not simply to imagine it — we’ve got to entrench it. It must be entrenched inside the workflow itself, which is each these elements that may be codified into /abilities and /instructions, and people elements which should be a part of the human verification course of. There’ll by no means be a time when there isn’t a human verification since you and I are 100% liable for the whole lot we do as scientists. However we will reduce these errors as a lot as attainable by means of correctly designed workflows such that when do insert ourselves, our time is extra effectively used.
However as my errors are produced by a “Scott Cunningham mounted impact”, the options will should be designed with me, and never another person, in thoughts too. And that’s the place /insights is available in. You should use /insights as a Myers-Briggs kind of instrument that figures you out, and thus helps you determine options that give you the results you want.
One of many issues /insights revealed was that my compiledeck instrument — the set of directions that tells Claude learn how to construct a slide deck — had been a command and not a talent. The excellence issues as abilities and instructions cascade by means of your Claude Code interactions in a different way.
A command is a single file of directions that Claude reads as soon as. Consider it as a memo you hand to a analysis assistant. They learn it and do their greatest. However the issue is that “do your greatest” isn’t ok while you want zero errors. A memo that claims “verify for TikZ collisions” doesn’t work if the RA doesn’t know learn how to measure a collision. The instruction is aspirational, not operational.
So over the method of that video stroll by means of, Claude and I transformed it to a talent. A talent is totally different from a command in that abilities are a structured listing with a number of recordsdata. The primary file has the operational workflow. A separate file has the TikZ guidelines with precise formulation. One other file has coloration palettes extracted from actual decks. One other has domain-specific patterns for several types of shows.
The distinction, in different phrases, between a command and a talent is the distinction between telling somebody what to do and coaching them learn how to do it. And critically, the talent lives in a worldwide listing (~/.claude/abilities/) as a substitute of buried in a single mission folder, so it’s out there regardless of the place I’m working.
That was one other friction level /insights recognized — I’d constructed the instrument in a single place after which couldn’t entry it from one other.
The Comparative Benefit Downside
Right here’s the place I half methods barely with the “starter pack” philosophy that’s in style within the AI productiveness world proper now. A number of have urged sharing immediate libraries and workflow templates — downloading another person’s system and plugging it in. I perceive the attraction. However I’m suspicious of it, for financial causes.
For one, I imagine in comparative benefit rules that are totally distinctive to an individual’s personal manufacturing operate, which is extraordinarily private. The insights evaluation confirmed me that my edge is “formidable delegation with sharp correction” — I delegate greater than most customers, however I additionally audit extra aggressively. That’s not a transferable template nor ought to it essentially be — at the very least not in the identical means. It’s extra of a disposition than a template. Somebody who delegates with out auditing will get a distinct set of errors requiring their very own answer to these errors as a result of the information producing course of is totally different and distinctive to that individual. Somebody who audits with out delegating by no means will get to the formidable tasks except they determine learn how to remedy that drawback which is exclusive to their very own type.
The TikZ guidelines we constructed at the moment are particular to my workflow. I make a whole lot of slide decks. I exploit a whole lot of TikZ diagrams. I’m specific about visible high quality. Somebody who primarily writes Python scripts and by no means touches LaTeX would want fully totally different guidelines. The /insights knowledge would inform them fully various things about the place their friction lives.
Because of this I believe the correct mannequin isn’t downloading another person’s workflow. It’s often working /insights in your personal utilization, discovering your personal friction factors, and iteratively constructing your personal set of abilities and guidelines. The common rules exist — zero tolerance for errors, convert spatial issues to computational ones, all the time ask why an error occurred as a substitute of simply fixing it. However the particular implementation is yours.
What I’m Going to Do Now
I’m going to maintain utilizing /insights often — possibly each few weeks — to verify in alone patterns. Every time, I anticipate finding new friction factors that I didn’t discover earlier than, as a result of the previous ones can have been mounted. That is Deming’s perception utilized to particular person data work: the method of enchancment is itself a course of that improves.
The decks will preserve being my considering instrument. The talents will continue to grow as I discover new blind spots. And I’ll preserve writing about it right here — not as a result of my workflow is the correct one for you, however as a result of the course of of discovering your workflow is common even when the result’s private.
