AI Sec Watch: A Security Intelligence Platform for AI Systems

Luu, T.J.

Trap: Mitigating Poisoning-Based Backdoor Attacks by Treating Poison With Poison

inforesearchPeer-Reviewed

securityresearch

Source: IEEE Xplore (Security & AI Journals)December 15, 2025

Summary

This research addresses backdoor attacks, where poisoned training data (maliciously altered samples inserted into a dataset) causes neural networks to behave incorrectly on specific inputs. The authors propose a defense method called Trap that detects poisoned samples early in training by recognizing they cluster separately from legitimate data, then removes the backdoor by retraining part of the model on relabeled poisoned samples, achieving very high attack detection rates with minimal accuracy loss.

Solution / Mitigation

The paper proposes detecting poisoned samples during early training stages and removing the backdoor by retraining the classifier part of the model on relabeled poisoned samples. The authors report their method reduced average attack success rate to 0.07% while only decreasing average accuracy by 0.33% across twelve attacks on four datasets.