Extracting Training Dialogue Data From Large Language Model-Based Task Bots
Summary
Large Language Models (LLMs, AI systems trained on massive amounts of text) used in task-oriented dialogue systems (AI assistants designed to help users complete specific goals like booking travel) can accidentally memorize and leak sensitive training data, including personal information like phone numbers and complete travel schedules. Researchers demonstrated new attack techniques that can extract thousands of pieces of training data from these systems with over 70% accuracy in the best cases. The paper identifies factors that influence how much data LLMs memorize in dialogue systems but does not propose specific fixes.
Classification
Related Issues
CVE-2025-45150: Insecure permissions in LangChain-ChatGLM-Webui commit ef829 allows attackers to arbitrarily view and download sensitive
CVE-2025-54868: LibreChat is a ChatGPT clone with additional features. In versions 0.0.6 through 0.7.7-rc1, an exposed testing endpoint
Original source: http://ieeexplore.ieee.org/document/11422042
First tracked: March 20, 2026 at 08:03 AM
Classified by LLM (prompt v3) · confidence: 92%