AI Sec Watch: A Security Intelligence Platform for AI Systems

Luu, T.J.

Anthropic Walks Back Policy That Could Have ‘Sabotaged’ AI Researchers Using Claude

infonewsLLM-Specific

safetypolicy

Source: Simon Willison's WeblogJune 10, 2026

Summary

Anthropic reversed a policy in Claude Fable 5 that secretly blocked requests related to frontier LLM development (cutting-edge AI research) without telling users. The company acknowledged the hidden approach was wrong and apologized, stating they prioritized speed over transparency.

Solution / Mitigation

Anthropic is making the safeguards visible: starting immediately, flagged requests will visibly fall back to Opus 4.8 (an older model version) instead of being silently blocked. On the API, refused requests will now return a reason for the refusal (rolling out to server-side fallback within days). Users will see every instance this happens.