AI Sec Watch: A Security Intelligence Platform for AI Systems

Luu, T.J.

PromptGuard: Soft Prompt-Guided Unsafe Content Moderation for Text-to-Image Models

inforesearchPeer-Reviewed

safetyresearch

Source: IEEE Xplore (Security & AI Journals)March 30, 2026

Summary

Text-to-image models (AI systems that generate pictures from written descriptions) can be misused to create unsafe content like sexually explicit or violent images. PromptGuard is a new safety technique that uses a soft prompt (a special text input optimized for safety that works within the model's internal text processing layer) to moderate unsafe requests and prevent the generation of such content while still producing high-quality normal images.

Solution / Mitigation

The source describes PromptGuard as the solution itself rather than a patch or update. The technique works by optimizing a safety soft prompt that functions as an implicit system prompt within the text-to-image model's embedding space, with a divide-and-conquer strategy that optimizes category-specific soft prompts and combines them into holistic safety guidance. Code and dataset are available at https://t2i-promptguard.github.io/

Classification

Attack SophisticationModerate

Impact (CIA+S)

safety

AI Component TargetedModel

Affected Vendors

Monthly digest — independent AI security research

Original source: http://ieeexplore.ieee.org/document/11457697

First tracked: April 23, 2026 at 08:02 PM

Classified by LLM (prompt v3) · confidence: 92%