Anthropic releases Mythos-class Fable 5 model with safeguards for cyber risks
Summary
Anthropic released Claude Fable 5, a powerful AI model based on its restricted Mythos architecture, with built-in safeguards to make it safely available to the general public. The safeguards work by automatically routing requests about cybersecurity, biology, chemistry, and other high-risk topics to a less capable model (Claude Opus 4.8), though early testing suggests these safeguards may be broader than intended and sometimes block benign requests. Anthropic developed AI-powered classifiers (systems that categorize requests) to identify and block potentially dangerous requests, and says internal and external testing found no effective jailbreaks (methods to bypass security restrictions) that could consistently get around these protections.
Solution / Mitigation
Anthropic has developed AI-powered classifiers designed to identify potentially dangerous requests and redirect them to a less capable model (Claude Opus 4.8). The company states that 'extensive internal and external testing failed to uncover broadly effective jailbreaks that would consistently bypass the safeguards.' Additionally, Anthropic describes the safeguards as 'intentionally conservative' and says it is 'continuing refining the system' while prioritizing safety over convenience.
Classification
Affected Vendors
Related Issues
Original source: https://www.csoonline.com/article/4183094/anthropic-releases-mythos-class-fable-5-model-with-safeguards-for-cyber-risks.html
First tracked: June 9, 2026 at 08:00 PM
Classified by LLM (prompt v3) · confidence: 92%