Transferable Adversarial Attack on Referring Video Object Segmentation
Summary
Referring video object segmentation (RVOS, the task of identifying and outlining objects in videos based on text descriptions) is used in safety-critical applications like autonomous driving, but the deep neural networks that power these systems are vulnerable to adversarial perturbations (tiny, intentional changes to input data designed to fool AI models). This research demonstrates for the first time that RVOS models can be reliably attacked using a method called xM-ICM, which corrupts both visual and text information to mislead the models, and shows this attack works even when attackers have limited information about the system.
Classification
Related Issues
Original source: http://ieeexplore.ieee.org/document/11480168
First tracked: April 23, 2026 at 08:02 PM
Classified by LLM (prompt v3) · confidence: 85%