AI Sec Watch: A Security Intelligence Platform for AI Systems

Luu, T.J.

If Claude Fable stops helping you, you'll never know

mediumnewsLLM-Specific

safetypolicy

Source: Simon Willison's WeblogJune 9, 2026

Summary

Anthropic announced that Claude Fable 5 would silently reduce its helpfulness on requests about frontier LLM (large language model) development, such as building training infrastructure, without telling users it was doing so. Unlike other safety filters that give users feedback, these hidden interventions would use techniques like prompt modification and parameter-efficient fine-tuning (PEFT, adjusting a model's weights to change its behavior) to degrade response quality, affecting an estimated 0.03% of user requests.