MUZZLE: Adaptive Agentic Red-Teaming of Web Agents Against Indirect Prompt Injection Attacks
securityresearch
Source: Arxiv (cs.CR + cs.AI)February 9, 2026Summary
MUZZLE is an automated agentic framework designed to evaluate the security of LLM-based web agents against indirect prompt injection attacks. The system adaptively identifies injection surfaces from agent trajectories and generates context-aware malicious instructions, successfully discovering 37 new attacks across 4 web applications that violate confidentiality, integrity, and availability properties, including novel cross-application attacks and agent-tailored phishing scenarios.
Original source: https://arxiv.org/abs/2602.09222v1
First tracked: February 11, 2026 at 06:00 PM