AGFPS: An Automated Gradient-Free Framework for Prompt Stealing
Summary
AGFPS is a new attack method that steals system prompts (the hidden instructions that control how an LLM behaves) from deployed AI applications by using evolutionary optimization (a technique that mimics natural selection to find solutions) instead of gradient-based methods. The researchers demonstrated that their approach successfully extracted prompts 95.2% of the time and worked better than previous methods, highlighting serious security weaknesses in how LLMs are currently deployed.
Classification
Affected Vendors
Related Issues
Original source: http://ieeexplore.ieee.org/document/11425813
First tracked: May 14, 2026 at 08:01 PM
Classified by LLM (prompt v3) · confidence: 92%