PromptGuard: Soft Prompt-Guided Unsafe Content Moderation for Text-to-Image Models
Summary
Text-to-image models (AI systems that generate pictures from written descriptions) can be misused to create unsafe content like sexually explicit or violent images. PromptGuard is a new safety technique that uses a soft prompt (a special text input optimized for safety that works within the model's internal text processing layer) to moderate unsafe requests and prevent the generation of such content while still producing high-quality normal images.
Solution / Mitigation
The source describes PromptGuard as the solution itself rather than a patch or update. The technique works by optimizing a safety soft prompt that functions as an implicit system prompt within the text-to-image model's embedding space, with a divide-and-conquer strategy that optimizes category-specific soft prompts and combines them into holistic safety guidance. Code and dataset are available at https://t2i-promptguard.github.io/
Classification
Affected Vendors
Original source: http://ieeexplore.ieee.org/document/11457697
First tracked: April 23, 2026 at 08:02 PM
Classified by LLM (prompt v3) · confidence: 92%