AI Sec Watch: A Security Intelligence Platform for AI Systems

Luu, T.J.

AI models more vulnerable than claimed when faced with iterative attacks

infonewsLLM-Specific

securityresearch

Source: CSO OnlineMay 27, 2026

Summary

A Cisco study found that popular AI models from OpenAI, Anthropic, Google, and others are much more vulnerable to attack when faced with multiple prompts in a conversation compared to single-prompt tests. Current safety benchmarks (standardized tests that measure how well models resist harmful requests) only test models with one prompt at a time, but real attackers use iterative techniques like role-playing, breaking tasks into smaller steps, and gradually escalating requests across multiple turns, which bypass safety guardrails far more effectively than official scores suggest.