Understanding and Enhancing Encoder-based Adversarial Transferability against Large Vision-Language Models

securityresearch

Source: Arxiv (cs.CR + cs.AI)February 10, 2026

Summary

This research systematically studies adversarial transferability of encoder-based attacks against large vision-language models (LVLMs), revealing that existing attacks have severely limited transferability across different LVLM architectures. The study identifies two root causes hindering transferability: inconsistent visual grounding across models and redundant semantic alignment within models. To address these limitations, the authors propose Semantic-Guided Multimodal Attack (SGMA), a framework that achieves higher transferability by directing perturbations toward semantically critical regions and disrupting cross-modal grounding at both global and local levels.

Original source: https://arxiv.org/abs/2602.09431v1

First tracked: February 11, 2026 at 06:00 PM