Action-Perturbation Backdoor Attacks on Partially Observable Multiagent Systems
Summary
Researchers discovered a type of backdoor attack (hidden malicious instructions planted in AI systems) on multiagent reinforcement learning systems, where one adversary agent uses its actions to trigger hidden failures in other agents' decision-making policies. Unlike previous attacks that assumed unrealistic direct control over what victims observe, this attack is more practical because it works through normal agent interactions in partially observable environments (where agents cannot always see what others are doing). The researchers developed a training method to help adversary agents efficiently trigger these backdoors with minimal suspicious actions.
Classification
Related Issues
Original source: http://ieeexplore.ieee.org/document/11202248
First tracked: February 12, 2026 at 02:22 PM
Classified by LLM (prompt v3) · confidence: 92%