Wednesday, April 29, 2026

One reasonable step ahead for white-hat hacking; one large leap ahead for journalists’ credulity


[I kid, of course. That particular shark was jumped ages ago.] 

You need to in all probability strategy any story of enormous language fashions displaying initiative, or making an attempt to mislead or blackmail customers, or usually doing something of the type with the identical mindset you strategy accounts of paranormal exercise. In each circumstances, just about all of the reporting will probably be sensationalistic, anecdotal, and prone to collapse below scrutiny.

Instance du jour, Anthropic is getting an unlimited quantity of sky-is-falling protection over what seems to be the event of a very good however hardly revolutionary white-hat hacking instrument. 

Here is Gary Marcus’s evaluation:

To a sure diploma, I really feel that we had been performed. The demo was positively proof of idea that we have to get our regulatory and technical home so as, however not the fast risk the media and public was result in consider. 

Not solely has the reporting been credulous and incurious, it has largely ignored the ever-present elephants within the room when discussing OpenAI, Anthropic, and many others.

Cal Newport follows up:

Since Marcus revealed his essay, I’ve come throughout a number of extra related findings:

  • The AI safety skilled Stanislav Fort ran ​an experiment​
    to see if present, low-cost open-weight fashions might discover the identical
    vulnerability in FreeBSD (an open-source working system) that
    Anthropic touted as proof of Mythos’s scary talents to uncover bugs
    that had been hiding for many years. The consequence: all eight present fashions
    they examined found the identical subject.
  • In the meantime, the famend safety researcher Bruce Schneier ​weighed in​, equally concluding: “You don’t want Mythos to search out the vulnerabilities they discovered.”

And naturally, it doesn’t assist {that a} week earlier than Anthropic launched
this supposedly super-powered vulnerability detector, they by chance
leaked the Claude Code supply, and safety researchers instantly
discovered ​critical vulnerabilities​. (I suppose Anthropic forgot to make use of Mythos to scrub up their very own software program…)

Journalists masking this story must continuously remind themselves that lots of of billions of {dollars}, presumably even trillions, are at play right here. What’s extra, the fixed circulate of funding that retains this sport going seems to be drying up, making this the highest-stakes sport of musical chairs ever performed. One of many key motivators that has saved the music going this lengthy has been the fastidiously promoted perception that the tip of the world is presumably days away and the one factor that may save us is that if the nice wizard discovers the incantation earlier than the dangerous wizard does (on the threat of placing too fantastic some extent on it, the dangerous wizard right here is China).

Software program developer Carl Brown of the Web of Bugs has a very good take. Specifically, pay shut consideration to the half about Accountable Disclosure

Brown bought on my radar by means of this wonderful dialogue with Ed Zitron, Over an hour however effectively well worth the time.

Related Articles

Latest Articles