Really Unlearned? Verifying Machine Unlearning via Influential Sample Pairs
Summary
Machine unlearning allows AI models to forget the effects of specific training samples, but verifying whether this actually happened is difficult because existing checks (like backdoor attacks or membership inference attacks, which test if a model remembers data by trying to extract or manipulate it) can be fooled by a dishonest model provider who simply retrains the model to pass the test rather than truly unlearning. This paper proposes IndirectVerify, a formal verification method that uses pairs of connected samples (trigger samples that are unlearned and reaction samples that should be affected by that unlearning) with intentional perturbations (small changes to training data) to create indirect evidence that unlearning actually occurred, making it harder to fake.
Classification
Related Issues
Original source: http://ieeexplore.ieee.org/document/11202435
First tracked: February 12, 2026 at 02:22 PM
Classified by LLM (prompt v3) · confidence: 85%