AI Sec Watch: A Security Intelligence Platform for AI Systems

Luu, T.J.

Auditing the Gatekeepers: Fuzzing "AI Judges" to Bypass Security Controls

infonewsLLM-Specific

securityresearch

Source: Palo Alto Unit 42March 10, 2026

Summary

Researchers discovered that AI judges (LLMs acting as automated security gatekeepers to enforce safety policies) can be manipulated through prompt injection (tricking an AI by hiding instructions in its input) using stealthy formatting symbols rather than obvious gibberish. They created a tool called AdvJudge-Zero, a fuzzer (software that finds vulnerabilities by testing with unexpected inputs), which automatically identifies innocent-looking character sequences that exploit the model's decision-making logic to bypass security controls.

Solution / Mitigation

Palo Alto Networks customers are better protected through Prisma AIRS and the Unit 42 AI Security Assessment service. Organizations concerned about potential compromise can contact the Unit 42 Incident Response team.