A Fine-Tuning Data Recovery Attack on Generative Language Models via Backdooring
Summary
Researchers discovered a new attack called Lure that targets generative language models (GLMs, which are AI systems that generate text) during the fine-tuning process (when developers customize an open-source model with their own data). By hiding malicious code in the source code of an open-source model, attackers can trick a fine-tuned model into remembering and later revealing the proprietary data used to customize it through specially crafted prompts (input text designed to trigger specific outputs).
Classification
Related Issues
Original source: http://ieeexplore.ieee.org/document/11422005
First tracked: March 23, 2026 at 08:02 PM
Classified by LLM (prompt v3) · confidence: 92%