MUZZLE: Adaptive Agentic Red-Teaming of Web Agents Against Indirect Prompt Injection Attacks

securityresearch

Source: Arxiv (cs.CR + cs.AI)February 9, 2026

Summary

MUZZLE is an automated agentic framework designed to evaluate the security of LLM-based web agents against indirect prompt injection attacks. The system adaptively identifies injection surfaces from agent trajectories and generates context-aware malicious instructions, successfully discovering 37 new attacks across 4 web applications that violate confidentiality, integrity, and availability properties, including novel cross-application attacks and agent-tailored phishing scenarios.

Original source: https://arxiv.org/abs/2602.09222v1

First tracked: February 11, 2026 at 06:00 PM