AI brokers and dangerous productiveness metrics

February 23, 2026

4

Right here’s slightly little bit of snark from developer John Crickett on X:

Software program engineers: Context switching kills productiveness. Additionally software program engineers: I’m now managing 19 AI brokers and doing 1,800 commits a day.

Crickett’s quip lands completely as a result of it isn’t really a joke. It’s a preview of the subsequent administration fad, whereby we exchange one dangerous productiveness proxy (strains of code) with a fair worse one (agent output), then act shocked when high quality collapses.

And sure, I do know, no person is doing 1,800 significant commits. However that’s the purpose. The metric is already being gamed, and brokers make gaming easy. In case your group begins celebrating “commit velocity” within the agent period, you aren’t measuring productiveness. You’re measuring how shortly your crew can manufacture legal responsibility.

The good promise of generative synthetic intelligence was that it could lastly clear our backlogs. Coding brokers would churn out boilerplate at superhuman speeds, and groups would lastly ship precisely what the enterprise needs. The truth, as we settle into 2026, is much extra uncomfortable. Synthetic intelligence just isn’t going to save lots of developer productiveness as a result of writing code was by no means the bottleneck in software program engineering. The true bottleneck is validation. Integration. Deep system understanding. Producing code with out a rigorous validation framework just isn’t engineering. It’s merely mass-producing technical debt.

So what do we modify?

Pondering appropriately about code

First, as I argued lately, we have to cease fascinated by code as an asset in isolation. Each single line of code is floor space that should be secured, noticed, maintained, and stitched into all the pieces round it. As such, making code cheaper to jot down doesn’t scale back the entire quantity of labor however as an alternative will increase it as a result of you find yourself manufacturing extra legal responsibility per hour.

For years, we handled builders like extremely paid Jira ticket translators. The idea was that you could possibly take a well-defined requirement, convert it to syntax, and ship it. Crickett rightfully factors out that if that is all you’re doing, then you’re completely replaceable. A machine can do fundamental translation, and a machine is completely comfortable to do all of it day with out complaining.

What a machine can’t do, nonetheless, is perceive important enterprise context. AI can’t really feel the monetary price of a compliance mistake or have a look at a buyer workflow and instinctively acknowledge that the underlying requirement is essentially fallacious. For this we want individuals, and we want individuals to thoughtfully contemplate precisely what they need AI to do.

Crickett frames this transition as a essential transfer towards spec-driven improvement. He’s proper, however we have to be extremely clear about what a specification means within the agent period. It’s not another Jira ticket however, reasonably, a set of constraints tight sufficient to make sure an LLM can’t escape them. In different phrases, it’s an executable definition of carried out, backed fully by checks, API contracts, and strict manufacturing alerts. That is the precise kind of foundational work we’ve got underinvested in for many years as a result of it doesn’t seem like uncooked output; it seems to be like course of. You recognize, that “boring stuff” that slows you down.

You may see the friction taking part in out in actual time simply by trying on the feedback to Crickett’s tweet. You’ll discover individuals desperately attempting to sq. the circle of agentic improvement. One commenter tries to reframe the chaos by calling it structure versus engineering. One other insists that managing 19 brokers is definitely orchestrating, not context switching. A 3rd bluntly states that operating greater than 5 brokers concurrently begins to seem like vibe coding, which is merely a well mannered phrase for playing with manufacturing methods. They’re all highlighting the core subject: You haven’t eradicated the work. You’ve simply moved it from implementation to supervision and assessment.

The extra you parallelize your code technology, the extra “assessment debt” you create.

Observability to the rescue

That is the place Charity Majors, the co-founder and CTO of Honeycomb, turns into pissed off. Majors has argued for years you can’t actually know if code works till you run it in manufacturing, beneath actual load, with actual customers, and actual failure modes. Once you use AI brokers, the burden of improvement shifts fully from writing to validating. People are notoriously dangerous at validating code merely by studying massive pull requests. We validate methods by observing their habits within the wild.

Now take that concept one step additional into the agent period. For many years, probably the most widespread debugging strategies was fully social. A manufacturing alert goes off. You have a look at the model management historical past, discover the one that wrote the code, ask them what they have been attempting to perform, and reconstruct the architectural intent. However what occurs to that workflow when nobody really wrote the code? What occurs when a human merely skimmed a 3,000-line agent-generated pull request, hit merge, and moved on to the subsequent ticket? When an incident occurs, the place is the deep information that used to reside contained in the writer?

That is exactly why wealthy observability just isn’t a nice-to-have characteristic within the agent period. It’s the one viable substitute for the lacking human. Within the agent period, we want instrumentation that captures intent and enterprise outcomes, not simply generic logs that say one thing occurred. We’d like distributed traces and high-cardinality occasions wealthy sufficient that we will reply precisely what modified, what it affected, and why it failed. In any other case, we’re making an attempt to function a black field constructed by one other black field.

Majors additionally provides important operational recommendation: Deploy freezes are an entire hack. The widespread human intuition when change feels dangerous is to cease deploying. However in the event you maintain merging agent-generated code whereas not deploying it, you’re merely batching threat, not lowering it. Once you lastly execute a deploy, you’ll have completely no thought which particular AI hallucination simply took down your cost gateway. So if you wish to freeze something, freeze merges. Higher but, make the merge and the deploy really feel like one singular atomic motion. The quicker that loop runs, the much less variance you will have, and the simpler it’s to pinpoint precisely what broke.

Golden paths are the best way

The repair for this impending chaos is to not depend on heroic engineers. As Majors factors out, resilient engineering requires a dedication to platform engineering and golden paths (one thing I’ve additionally argued). Such golden paths make proper habits extremely straightforward and the fallacious habits extremely onerous. The most efficient groups of the subsequent decade won’t be those with probably the most freedom to make use of no matter framework an agent suggests, however as an alternative people who function safely inside the very best constraints.

So how do you measure success within the agentic period?

The metrics that matter are nonetheless the boring ones as a result of they measure precise enterprise outcomes. The DORA metrics stay the very best sanity verify we’ve got as a result of they tie supply pace on to system stability. They measure deployment frequency, lead time for adjustments, change failure charge, and time to revive service. None of these metrics cares concerning the variety of commits your brokers produced in the present day. They solely care about whether or not your system can take up change with out breaking.

So, sure, use coding brokers. Use them aggressively! However don’t confuse code technology with productiveness. Productiveness is what occurs after code technology, when code is constrained, validated, noticed, deployed, rolled again, and understood. That’s the important thing to enterprise security and developer productiveness.

AI brokers and dangerous productiveness metrics

Pondering appropriately about code

Observability to the rescue

Golden paths are the best way

Related Articles

The ‘Breaking Dangerous’ Impact From Most cancers Is Actual, Examine Finds : ScienceAlert

A tour of datetime in Stata

Budgets, Throttling & Mannequin Tiering

Latest Articles

The ‘Breaking Dangerous’ Impact From Most cancers Is Actual, Examine Finds : ScienceAlert

A tour of datetime in Stata

Budgets, Throttling & Mannequin Tiering

Phrase, Excel & PowerPoint at your service for $30

We have noticed the strongest microwave laser within the identified universe