Trap: Mitigating Poisoning-Based Backdoor Attacks by Treating Poison With Poison
Summary
This research addresses backdoor attacks, where poisoned training data (maliciously altered samples inserted into a dataset) causes neural networks to behave incorrectly on specific inputs. The authors propose a defense method called Trap that detects poisoned samples early in training by recognizing they cluster separately from legitimate data, then removes the backdoor by retraining part of the model on relabeled poisoned samples, achieving very high attack detection rates with minimal accuracy loss.
Solution / Mitigation
The paper proposes detecting poisoned samples during early training stages and removing the backdoor by retraining the classifier part of the model on relabeled poisoned samples. The authors report their method reduced average attack success rate to 0.07% while only decreasing average accuracy by 0.33% across twelve attacks on four datasets.
Classification
Related Issues
Original source: http://ieeexplore.ieee.org/document/11300825
First tracked: March 16, 2026 at 08:02 PM
Classified by LLM (prompt v3) · confidence: 92%