Claude Code has made it simpler to do analysis now. However it’s about to get a lot more durable to publish in historically valued areas.
It is a factor I’ve been fascinated by since early this 12 months. And it type of coalesced for me after I sat down and prompted Claude code to completely automate a paper with the vaguest proposal I may provide you with. It got here up with the thought, a shift-share identification technique (which I then on a second immediate had him go deeper into by reviewing Peter Hull’s repository for his shift-share IV workshop at Mixtape Periods), crawled the online till it discovered appropriate knowledge, did the evaluation, wrote the paper, I then submitted that paper to refine.ink, paid round $40-50 for my referee report, I then uploaded that to the listing, had Claude make all revisions, then had referee2 (a persona from my mixtapetools repo) critique the paper, opened up two terminals and had brokers code audit by rewriting the complete pipeline in two different languages, confirmed no coding errors, resubmitted that again to refine.ink a final time, after which concluded.
The complete expertise took me $100 in refine.ink funds, and a pair hours max. I’ve solely skimmed the paper, however the expertise was enough to make me suppose that paper mills are coming — not on the journal aspect although for positive that, however on the precise paper manufacturing aspect. What I imply is I now suspect we’ll see a nontrivial quantity of paper mill on the supply — the researcher themselves. And so like every economist, I believed and thought and that’s this substack, which is mainly Claude code fan fiction concerning the new economics of educational publishing set within the very close to future. It’s a little bit of some rambling, with simulations based mostly on noticed distributions, and a few easy financial reasoning with assumed giant elasticities. However that’s the reason it’s Claude code fan fiction.
Thanks once more everybody to your assist of the substack. It’s a labor of affection. For those who aren’t a paying subscriber please think about changing into one!
Coral Hart used to write down 10 to twenty romance novels a 12 months. Now she writes greater than 200. The distinction, she mentioned, is ChatGPT. She describes it as “assist,” although that phrase is doing an unlimited quantity of labor in that sentence. She brings in six figures doing this now which you get the sense is coming extra from quantity than high quality itself. The New York Instances profiled her in February.
Hart mentioned she has seen a 10-20x improve in cognitive output. That enormous of a achieve got here from utilizing a a lot easier LLM methodology than what’s obtainable now with Claude Code and different agent based mostly techniques of writing. And she or he’s writing romance novels — a style with conventions, a readership that values quantity, and a distribution channel (Amazon) that may publish something you add. The one bottleneck is the writer’s time, and the software eradicated that bottleneck.
However what occurs when the identical productiveness shock hits a system the place the bottleneck was by no means actually manufacturing within the first place, however relatively was a hierarchical journal construction that depended immensely on editor time, talent, discretion and voluntary employees with the identical abilities known as referees for screening high quality deemed enough for publication? What concerning the high quality of these papers? What about publishing? In spite of everything, there’s a distinction between writing a manuscript and publishing it at a journal, the latter which occurs after the paper was written. What is going to occur to publishing?
The distribution will change
If the unconditional likelihood of acceptance at a top-5 journal is round 3-5%, and the price of producing a submission-quality paper drops to close zero, then the anticipated worth calculation is easy. Write 100 papers. Submit all of them. Handle a large portfolio. Although most will fail, you solely want just a few to land. You possibly can’t win the lottery if you happen to don’t purchase a ticket.
Think about that the worth of a top-5 hasn’t modified — at the very least not but. Then if the price of exercising that possibility has collapsed, the variety of new submissions can be based mostly on the magnitude on numerous elasticities measuring the response throughout the pipeline. My hunch is that many nodes have provide responses that obtained extra elastic that means we should always anticipate giant provide responses, however not all, and the place they’ve remained inelastic, we should always anticipate bottlenecks and due to this fact queuing, and virtually actually the injection of some noise.
Reimers and Waldfogel studied what occurred to e book publishing after ChatGPT launched. The variety of new titles on Amazon tripled. Common high quality fell. The very best books didn’t change a lot — the frontier stayed the place it was. However the mass of recent entries got here from the left tail of the standard distribution.
I’ll elaborate on the numbers on this graphic later, however for now think about one thing like this as a visible to information you thru fan fiction essay. The inexperienced are the variety of papers of highest high quality proxied by publications throughout practically 87 journals (which I pulled out of articles I discovered on-line). There’s round 3,800 publication slots traditionally there. The yellow is the variety of human submissions pre-AI. This was calculated by going by way of all 87 journals, approximating their acceptance charges and utilizing common variety of points and articles printed every year. Whereas the variation in acceptance charges is ranges from 5 to twenty% within the high 87 journals, the common general is nearer to 10%. Therefore why I extrapolated to 39,016. I determine that is improper, however not by a lot.
However the blue is a usually distributed and sizable 5x improve in submissions coming from AI. A few of these can be absolutely automated, that means they have been produced in only some hours with out a human within the loop, whereas others will take weeks with a human within the loop pretty intensively, however nonetheless leading to a brand new manuscript in a fraction of historic time use. And I mannequin it as usually distributed as a result of paper high quality is the product of many impartial elements — subject, knowledge, execution, writing — and portions formed by many impartial inputs have a tendency towards regular.
Now take a look at what’s already taking place in economics. The College of Zurich’s Social Catalyst Lab is working one thing known as Challenge APE — Autonomous Coverage Analysis. It makes use of Claude Code to autonomously generate empirical economics papers. Not drafts. Full papers with identification methods, knowledge assortment, estimation, tables, figures, and writeups. As of this writing, it has produced 204 papers — with 60 added in a single week. Their said purpose is 1,000.
However are they any good? In head-to-head matchups, the AI papers win 4.7% of the time towards human papers from the AER and AEJ: Coverage. The Elo hole is huge — 1,154 for the common AI paper versus 1,831 for the common AER equal article. Right here you possibly can see indicators of the distribution being each regular and having an extended sufficient quantity of mass on the proper tail to warrant the concept that papers is likely to be ok for prime quality retailers, however which might solely be achieved at scale too.
In order you possibly can see above in these graphics, just a few AI papers do crack the highest 40 out of 247 whole entries. Which is what you’d anticipate if the AI papers come from the conventional distribution, as bear in mind the tails of the conventional can theoretically attain detrimental infinity (blinding of their awfulness) to optimistic infinity (one in one million spectacular). And the newest cohort they’ve been engaged on, too, is already enhancing with a barely larger 7.6% win fee.
And think about this. These are absolutely automated papers, like a model 1.0, with no human iteration in any respect. What would possibly occur if the papers get deep shut seems to be, or maybe get refined by way of one thing like refine.ink?
Journal income within the shortrun
I attempted to work out some easy again of the envelope numbers for this illustration however I used as my baseline issues I discovered right here and there. So let’s begin with some fundamental, although approximate, baseline information about the one career I really feel certified to speak about — my very own. Economics.
There are roughly 12,000 research-active economists who undergo ranked journals. At the moment they generate about 39,000 submissions per 12 months — roughly 3 per researcher. If the common goes from 3 to 10, that’s a 3x improve from present authors alone. However then add in new entrants who beforehand couldn’t produce at submission high quality and also you’re at 4-5x. Which is how I arrive at 5x.
However 3d printing a manuscript isn’t the price of publishing since you should additionally pay journal charges upon submitting. That scales linearly. Nonetheless, the price of this portfolio remains to be trivially low. The common submission price is $112. Going from 3 to 10 submissions prices an extra $784 in charges. Add a Claude Max subscription at $200 a month. The full annual value of tripling your output is about $3,200. That’s lower than one convention journey. Not everybody can afford it, however given a single top-5 publication is value lots in presently discounted anticipated worth, then given economists wages, I anticipate there’s a nontrivial variety of individuals at that threshold. Plus coauthors can cut up it.
Demand for a kind of 3,800 slots at present price ranges is nearly completely inelastic. Let me abuse the thought of an elasticity a bit of as an example this. Given the amount improve in submissions, they’ll increase costs and nonetheless be at the next variety of submissions than they’d been earlier than Claude Code. That’s not the elasticity, which is a ceteris paribus measure, nevertheless it’s value protecting that in thoughts too. They’re taking a look at something from a swell to a rogue wave bearing down on them although.
I pulled knowledge on 87 economics journals — high 5, basic curiosity by tier, AEJ sequence, high area, second tier, and third tier, after which grouped them into classes with approximations of acceptance charges. Collectively they publish about 3,800 articles per 12 months and obtain roughly 39,000 submissions.
These 3,800 slots are mounted within the quick run. Journals can’t print extra pages, rent extra editors, or increase their points in a single day. Demand doesn’t reply to the rightward shift in provide apart from to easily allocate 3,800 submissions into 3,800 slots in journals.
The highest-5 at the moment settle for about 5% of submissions. At 5x quantity, that drops to 1%. At 10x, it’s 0.5%. So this should cut back acceptance charges if journals do nothing.
So let’s assume for now that journals do nothing besides what they’ve been doing. Then what? Then they’re about to make some huge cash.
At present volumes, these 87 journals acquire roughly $6.2 million per 12 months in submission charges. At 5x, that’s $31 million. The highest-5 alone would go from $812,000 to $4.1 million — principally from papers that get desk rejected inside per week.
Editors, referees, and bottlenecks
Each submission may have run each conceivable robustness test. Each paper may have been by way of Refine.ink, in all probability a number of instances. Economics articles are already notoriously lengthy. They’re about to get longer. Anticipate extra appendices. Anticipate higher writing and extra “stunning figures”.
Contemplate the economics of a service like refine.ink. Ben Golub’s service sits at precisely the correct place within the manufacturing chain to typically receives a commission a number of instances for a similar paper — earlier than submission, throughout editorial screening, throughout evaluation, and once more after the R&R. That’s doubtlessly 4 to 5 funds per paper. It’s a superb enterprise mannequin as a result of it solves bottleneck drawback created by human analysis. Not solely will researchers be paying extra journal charges; they can even pay verification charges too.
However the perverse result’s that each paper turns into more durable to differentiate because of such intense repeated sharpening? When each submission is polished and empirically meticulous, the signal-to-noise ratio for editors doesn’t enhance — it will get worse. The marginal data content material of “this paper is well-executed” drops to zero as a result of the left tail now not trails off. Reasonably it hits a large wall of very comparable wanting papers written effectively, with knowledge, execution, and doubtless attention-grabbing outcomes. The abilities on the desk of instantly rejecting these beneath the bar are prone to be stretched, however I believe they are going to be, they usually’ll be having to parse by way of a variety of papers, and in the event that they don’t — in the event that they depend on heuristics — then the query is how biased will these heuristics be on this new surroundings?
However the desk reject is barely the primary stage. The second is the refereeing. Submissions can multiply by 5x, however the referee pool can’t multiply by 5x as it’s restricted by the dimensions of the variety of PhDs. Most referees aren’t paid — similar to taxes are the worth of dwelling in a civilized society, serving as a referee is the worth of dwelling within the educational society. You’re asking tenured professors to spend 10-20 hours evaluating another person’s paper as knowledgeable obligation. At present volumes, this barely works. However at 5x, it breaks. Actually, it’ll in all probability break at 1.5x.
We have to make some guesses concerning the desk rejection fee in addition to the referee pool. Let’s assume then that the referee pool stays mounted. If that occurs, then the desk rejection fee has to rise from perhaps 50% to in all probability nearer to 90% simply to maintain the system from collapsing. Editors could be rejecting 173,000 manuscripts a 12 months on a skim — 9 out of 10 papers, lifeless on arrival, with much less time per paper.
Inevitably pattern-matching shortcuts emerge. Like what? Effectively what’s observable apart from the manuscript that is likely to be tied to high quality? Perhaps researcher pedigree, title recognition, institutional affiliation. If these are correlated, even weakly, with high quality, then perhaps they replace once they see these to try to minimize by way of the noise. However that is imperfect, to not point out unfair, and so desk rejection will get noisier: good papers get killed by drained editors and marginally decrease high quality papers slip by way of to referees. It’s a cascading failure: quantity breaks editors, damaged modifying wastes referees, wasted referees gradual science.
However what if a few of the 5x elevated submissions get handed on to the referees? Effectively at 5x submissions, with out an aggressive improve in desk rejection, the system would want over 146,000 referee reviews per 12 months — towards a practical provide of perhaps 54,000. That’s since you traditionally have someplace between 2 and 5 referees per paper. And you can’t faucet the identical human useful resource 3 times more durable and anticipate it to conform. In some unspecified time in the future the entire “taxes are the worth of civilization” argument will break down. Residents have been identified to revolt towards tax coverage anyway, even modest ones.
So what fills the hole? The identical factor inflicting the issue: LLMs. The trustworthy reply would possibly make individuals uncomfortable however think about this — people weren’t being paid to referee within the first place. It has all the time been voluntary and unpaid labor. The human-centric system has run ok for many years to centuries, relying on what we imply, however be mindful two issues: for many of the historical past of science, human peer evaluation didn’t exist, and secondly, human peer evaluation has helped trigger well-documented types of publication biases together with replication crisises. I feel refine.ink sees a shift in direction of intensive use of LLMs for refereeing as a really close to equilibrium situation as a result of take a look at the third possibility beneath their subscription mannequin — “finest for editors and frequent publishers”.
The arms race no one wins
Right here’s the issue with the anticipated worth calculation I laid out earlier. It’s appropriate for any particular person researcher — however when everybody does it, the collective final result is worse for nearly everybody. That is in all probability near a prisoner’s dilemma.
If a researcher is the one one who scales submissions utilizing LLMs, then that individuals features an edge. But when that features are actual, they received’t be the one one. And so within the new equilibrium, everyone seems to be producing 2-3x extra papers inflicting acceptance charges to drop, and in flip, the likelihood of publishing any given paper decrease regardless of arguably fewer coding errors and even perhaps every individuals work individually higher. However now to be in that new equilibrium, they’re spending an additional $3,200 a 12 months and the complete career is working sooner to maintain up with 3,800 slots. And you may’t unilaterally cease as a result of if you happen to return to three papers whereas everybody else is at 10, you’re strictly worse off until you’re assured that you simply someway can be handled in another way regardless of all of the noise within the machine.
Institutional responses
However that’s all short-run stuff. What about the long term? Effectively, in the long term, all mounted inputs are variable, so we would anticipate some issues which we are saying usually are not malleable to be very malleable. Issues just like the elevating of submission charges.
If the demand for slots is inelastic, then we should always completely anticipate journal charges to rise. I anticipate larger submission charges, which can fall hardest on junior school on larger instructing hundreds, researchers in creating international locations, anybody with out grant funding or beneficiant analysis budgets.
The returns to high 5s can even rise, for some time anyway, given the amount of papers rise will trigger acceptance charges to say no. For the time being, only a few of the papers automated by AI can compete face to face towards AER-equivalent pubs however some will because of the reality the conventional distribution produces theoretically lengthy tails stretching to optimistic and detrimental infinity. Murphy’s regulation says something that may occur, will occur, with sufficient trials. What limits that is whether or not sufficient individuals will push the capability so far as it’ll go, nevertheless it’s completely there to be pushed. Its restraint is extra related to norms than functionality.
However to handle that, I do suspect we see AI screening on the desk. If the LLMs produce prime quality referee reviews already, then why wouldn’t editors use them to cull the herd? That’s the genius of Ben’s enterprise mannequin — it helps these submitting, and because the manufacturing of papers rises, its revenues develop each from early evaluations to most definitely a second analysis of the an identical manuscript, perhaps achieved minutes later, by the editor the staff simply submitted to. Duplicate evaluations are additionally prone to occur, not counting the sooner sharpening and the later sharpening as soon as the R&R hits.
The consequence: extra papers, roughly the identical publications, journals incomes extra, analysis companies incomes extra and most definitely double dipping too, referees with extra requests, school spending hundreds extra per 12 months solely to stay at equilibrium with none clear technological benefit. Deadweight loss from an arms race might be not strictly zero.
What I feel is coming
Even with AI screening on the desk, the noise doesn’t disappear — it most definitely simply migrates. Excellent automated screening can reply “is that this paper competent?” However it may possibly’t reply “is that this paper extra essential than that one?” And when 20,000 competent papers are competing for 3,800 slots, the ultimate choice rests on one thing apart from high quality — editor style, subject vogue, referee temper, institutional priors. Under 1% acceptance, you’re choosing amongst a crowd of certified papers utilizing standards which are more and more arbitrary.
And there’s a inform. Take a look at individuals’s web sites. Proper now, a productive economist might need 6-12 working papers listed. In two years, with automation, is somebody going to actually put up 75 unpublished manuscripts on their web site? That’s the paper mill signature, seen to everybody — hiring committees, tenure reviewers, grant panels. Even when each paper is competent, 75 unpublished manuscripts says “this individual is enjoying the lottery,” not “this individual is doing essential analysis.” The individuals who profit most from this equilibrium are those already producing 1-2 wonderful papers a 12 months who use AI to make every paper higher, no more quite a few. The individuals who could also be unexpectedly penalized are those who scale the manufacturing of papers into bigger and bigger quantity, as a result of quantity is seen — on web sites, but in addition to editors — and it’ll counsel an individual writes paper versus does analysis, and the market will value it accordingly, no matter that’s.
And bear in mind — that is the worst model of those instruments we’ll ever use. Challenge APE’s most up-to-date cohort has already improved from a 4.7% to 7.6% win fee in these head-to-head competitions. The standard distribution is altering with scale and it’s partly drifting rightward. As soon as AI papers begin changing into aggressive not simply on the area journal stage however at basic curiosity, that’s when the arms race intensifies essentially the most, as a result of the automated submissions aren’t simply filling the left tail anymore. They’re competing for a similar slots at the most effective journals, which turns into simpler to justify since presumably these are crucial papers scientifically too.
The binding constraint on science is shifting from manufacturing to analysis. The queue to get evaluated — not the issue of doing the work — turns into what determines how briskly data advances. And the trustworthy query no one needs to reply is whether or not human gatekeeping remains to be the correct approach to handle that queue, or whether or not we should always let the identical instruments that brought about the flood assist type by way of it.
I feel the noticeable disruptions are three months out, not three years. The provision curve has already shifted. The demand curve for publication slots hasn’t moved. The whole lot else follows from that.
















