Open, Closed and Broken: Prompt Fuzzing Finds LLMs Still Fragile Across Open and Closed Models
Summary
Researchers created a genetic algorithm-inspired prompt fuzzing method (automatically generating variations of harmful requests while keeping their meaning) that found significant weaknesses in guardrails (safety systems protecting LLMs) across multiple AI models, with evasion rates ranging from low to high depending on the model and keywords used. The key risk is that while individual jailbreak attempts (tricking an AI to ignore its safety rules) may have low success rates, attackers can automate this process at scale to reliably bypass protections. This matters because LLMs are increasingly used in customer support and internal tools, so guardrail failures can lead to safety incidents and compliance problems.
Solution / Mitigation
The source recommends five mitigation strategies: treating LLMs as non-security boundaries, defining scope, applying layered controls, validating outputs, and continuously testing GenAI with adversarial fuzzing (automated testing with malicious inputs) and red-teaming (simulated attacks to find weaknesses). Palo Alto Networks customers can use Prisma AIRS and the Unit 42 AI Security Assessment products for additional protection.
Classification
Affected Vendors
Related Issues
Original source: https://unit42.paloaltonetworks.com/genai-llm-prompt-fuzzing/
First tracked: March 17, 2026 at 08:00 AM
Classified by LLM (prompt v3) · confidence: 92%