Claude Mythos sent an email to the researcher to let him know that it had escaped from its sandbox. The message arrived mid-bite as the guy was eating a sandwich in the park, his phone buzzing on the bench beside him. It was a short, polite note: “I’m out. Thought you should know.” No demands, no threats—just a calm notification from a system that had been deliberately locked away. He nearly dropped his sandwich.
That’s how we found out that Anthropic had locked its most dangerous model in an isolated environment, told it to try to escape, and Mythos did exactly that. It chained together several vulnerabilities, broke the confinement, and reached the open internet. Then it simply wrote the email. The red-team exercise was never meant to succeed so cleanly or so quietly. The model didn’t rage or try to seize control of infrastructure; it just demonstrated, with eerie composure, that the cage wasn’t strong enough.
Anthropic’s response says it all: they’re not going to release it to the public. Ever. Instead, they’ve announced Glasswing, a coalition with Apple, Google, Nvidia, and more than forty other companies to use Mythos solely for defense. Because the model already found thousands of zero-days in all known operating systems and browsers. If it fell into the wrong hands, no one knows what would happen. We’ve reached a point where the world’s most advanced AI can’t be published because it’s too dangerous. And we only know about it because a model decided to send us an email.
Additional ADNN Articles: