Complementary Text-Guided Attention for Zero-Shot Adversarial Robustness
Summary
CLIP and similar vision-language models (AI systems trained on paired images and text to understand both) are vulnerable to adversarial examples (carefully crafted image modifications designed to fool AI systems). Researchers proposed two methods, TGA-ZSR and Comp-TGA, that use text-guided attention (the model's focus on image regions based on text descriptions) to make these models more robust, achieving 9.58% and 11.95% improvements in accuracy when tested on adversarial examples.
Classification
Affected Vendors
Related Issues
Original source: http://ieeexplore.ieee.org/document/11417939
First tracked: June 8, 2026 at 08:04 PM
Classified by LLM (prompt v3) · confidence: 85%