AI Sec Watch: A Security Intelligence Platform for AI Systems

Luu, T.J.

Anthropic releases Mythos-class Fable 5 model with safeguards for cyber risks

infonewsLLM-Specific

safetysecurity

Source: CSO OnlineJune 9, 2026

Summary

Anthropic released Claude Fable 5, a powerful AI model based on its restricted Mythos architecture, with built-in safeguards to make it safely available to the general public. The safeguards work by automatically routing requests about cybersecurity, biology, chemistry, and other high-risk topics to a less capable model (Claude Opus 4.8), though early testing suggests these safeguards may be broader than intended and sometimes block benign requests. Anthropic developed AI-powered classifiers (systems that categorize requests) to identify and block potentially dangerous requests, and says internal and external testing found no effective jailbreaks (methods to bypass security restrictions) that could consistently get around these protections.

Solution / Mitigation

Anthropic has developed AI-powered classifiers designed to identify potentially dangerous requests and redirect them to a less capable model (Claude Opus 4.8). The company states that 'extensive internal and external testing failed to uncover broadly effective jailbreaks that would consistently bypass the safeguards.' Additionally, Anthropic describes the safeguards as 'intentionally conservative' and says it is 'continuing refining the system' while prioritizing safety over convenience.