AI models more vulnerable than claimed when faced with iterative attacks
Summary
A Cisco study found that popular AI models from OpenAI, Anthropic, Google, and others are much more vulnerable to attack when faced with multiple prompts in a conversation compared to single-prompt tests. Current safety benchmarks (standardized tests that measure how well models resist harmful requests) only test models with one prompt at a time, but real attackers use iterative techniques like role-playing, breaking tasks into smaller steps, and gradually escalating requests across multiple turns, which bypass safety guardrails far more effectively than official scores suggest.
Classification
Affected Vendors
Related Issues
Original source: https://www.csoonline.com/article/4177903/ai-models-more-vulnerable-than-claimed-when-faced-with-iterative-attacks.html
First tracked: May 27, 2026 at 08:00 PM
Classified by LLM (prompt v3) · confidence: 92%