AI Sec Watch: A Security Intelligence Platform for AI Systems

Luu, T.J.

Extracting Training Dialogue Data From Large Language Model-Based Task Bots

inforesearchPeer-ReviewedLLM-Specific

securityprivacy

Source: IEEE Xplore (Security & AI Journals)March 5, 2026

Summary

Large Language Models (LLMs, AI systems trained on massive amounts of text) used in task-oriented dialogue systems (AI assistants designed to help users complete specific goals like booking travel) can accidentally memorize and leak sensitive training data, including personal information like phone numbers and complete travel schedules. Researchers demonstrated new attack techniques that can extract thousands of pieces of training data from these systems with over 70% accuracy in the best cases. The paper identifies factors that influence how much data LLMs memorize in dialogue systems but does not propose specific fixes.