Anthropic Built an AI That Scared Itself. Here's What Claude Mythos Actually Did.

Anthropic builds AI and tries to make it safe. That's literally their founding pitch. So when Anthropic itself looks at one of its own models and says "no, the world cannot have this yet" -- you pay attention.

That's exactly what happened with Claude Mythos Preview.

What is Claude Mythos?

Claude Mythos is Anthropic's newest frontier model -- a general-purpose AI in the same family as Claude Opus and Sonnet but significantly more capable. It was first spotted in late March when Fortune discovered a reference to it in an unsecured database on Anthropic's own website. The leak described a model that posed "unprecedented cybersecurity risks."

On April 8, 2026, Anthropic made it official: Mythos exists, it's powerful, and it will not be publicly released.

This is the first time in nearly seven years that a major AI company has publicly withheld a model over safety concerns. The last time this happened was 2019, when OpenAI held back GPT-2 over fears it could be used to generate disinformation at scale.

What Can It Do That Scared Everyone?

The short answer: it can hack.

Not in a "it can write a basic SQL injection snippet" way. In a "set it loose overnight and wake up to a working exploit" way.

Anthropic's own researchers did exactly that. They asked Mythos to search for vulnerabilities overnight. By morning, it had found one -- and already built a working attack around it.

Logan Graham, who leads offensive cyber research at Anthropic, said Mythos Preview can:

Identify zero-day vulnerabilities across major operating systems and web browsers
Weaponize those vulnerabilities -- not just find them, but build functional exploits
Do all of this without requiring deep security expertise from the person prompting it

The benchmark numbers back this up. On CyberGym -- an industry benchmark that tests AI agents on vulnerability analysis -- Mythos scored 83.1%. Claude Opus 4.6, which previously led the rankings, scored 66.6%. That's not a marginal improvement. That's a different class of capability.

The OpenBSD Bug That Broke People's Brains

One example from the system card is genuinely alarming. Mythos found a critical vulnerability in OpenBSD -- a highly secure operating system used to protect firewalls and critical infrastructure -- that had been sitting undetected for 27 years.

The bug could allow someone to crash a system remotely just by connecting to it. Twenty-seven years. Hidden in plain sight. Mythos found it overnight.

Anthropic says Mythos also found thousands of high- and critical-severity bugs across every major operating system and web browser currently in use. Billions of devices. Running software with holes that no human had caught yet.

Anthropic Built an AI That Scared Itself. Here's What Claude Mythos Actually Did.

What is Claude Mythos?

What Can It Do That Scared Everyone?

The OpenBSD Bug That Broke People's Brains

Arbind Singh

Comments

I searched 512,000 lines of Claude's source code. Those "secret codes" aren't in there.

It Also Did Some Genuinely Weird Things

So What's Happening Instead? Project Glasswing.

Is This Responsible Development or a PR Move?

What This Means If You're Building With AI

TL;DR