Exploring Security Vulnerabilities in Multilingual Speech Translation Systems via Deceptive Inputs
Summary
Researchers discovered that speech translation (ST) systems, which convert spoken words from one language to another, can be tricked by specially crafted audio manipulations that are imperceptible to human ears. They demonstrated two attack methods: adapting techniques from ASR (automatic speech recognition) attacks and using music-based perturbations to guide the system toward producing harmful outputs. These attacks worked across multiple languages and models, revealing a fundamental weakness in how current speech translation systems process and understand audio.
Classification
Related Issues
Original source: http://ieeexplore.ieee.org/document/11367280
First tracked: May 7, 2026 at 08:03 PM
Classified by LLM (prompt v3) · confidence: 92%