AI Sec Watch: A Security Intelligence Platform for AI Systems

Luu, T.J.

Really Unlearned? Verifying Machine Unlearning via Influential Sample Pairs

inforesearchPeer-Reviewed

securityresearch

Source: IEEE Xplore (Security & AI Journals)October 13, 2025

Summary

Machine unlearning allows AI models to forget the effects of specific training samples, but verifying whether this actually happened is difficult because existing checks (like backdoor attacks or membership inference attacks, which test if a model remembers data by trying to extract or manipulate it) can be fooled by a dishonest model provider who simply retrains the model to pass the test rather than truly unlearning. This paper proposes IndirectVerify, a formal verification method that uses pairs of connected samples (trigger samples that are unlearned and reaction samples that should be affected by that unlearning) with intentional perturbations (small changes to training data) to create indirect evidence that unlearning actually occurred, making it harder to fake.