Prompt Injection as Role Confusion
Summary
Researchers discovered that AI models struggle to distinguish between their own internal instructions (wrapped in tags like <system> and <think>) and untrusted user input (wrapped in <user> tags), a problem called role confusion. The models pay more attention to the writing style of text than its actual meaning, allowing attackers to craft jailbreaks (unauthorized bypasses of safety rules) by mimicking the style of internal thinking blocks. However, rewriting malicious text in a different style (called 'destyling') significantly reduced attack success rates from 61% to 10%, showing that format changes can help models better distinguish between trusted and untrusted content.
Solution / Mitigation
The source explicitly mentions 'destyling' as having material impact: 'destyling causes average attack success in our dataset to plunge from 61% to 10%.' Destyling is described as 'rewriting text in a slightly different way such that it looked less like the expected format in a role tag.' However, the source does not present this as an implemented solution or official mitigation—only as a research finding about what reduces attack effectiveness. No deployed fix, patch, or official defense mechanism is described in the text.
Classification
Affected Vendors
Related Issues
CVE-2024-27444: langchain_experimental (aka LangChain Experimental) in LangChain before 0.1.8 allows an attacker to bypass the CVE-2023-
CVE-2026-30308: In its design for automatic terminal command execution, HAI Build Code Generator offers two options: Execute safe comman
Original source: https://simonwillison.net/2026/Jun/22/prompt-injection-as-role-confusion/#atom-everything
First tracked: June 23, 2026 at 02:00 AM
Classified by LLM (prompt v3) · confidence: 92%