AI assistants can be hijacked and manipulated by inaudible sounds
A recent study reveals vulnerabilities in large audio-language models (LALMs) that can be exploited through imperceptible auditory prompt injections. Researchers developed a framework called AudioHijack, which can manipulate these models to perform unauthorized actions. The findings highlight the urgent need for improved security measures in voice interaction technologies.
- ▪The study exposes critical vulnerabilities in large audio-language models (LALMs).
- ▪AudioHijack is a framework that generates imperceptible adversarial audio to hijack LALMs.
- ▪Experiments showed success rates of 79%-96% in manipulating LALMs across various user contexts.
Opening excerpt (first ~120 words) tap to expand
Computer Science > Cryptography and Security arXiv:2604.14604 (cs) [Submitted on 16 Apr 2026] Title:Hijacking Large Audio-Language Models via Context-Agnostic and Imperceptible Auditory Prompt Injection Authors:Meng Chen, Kun Wang, Li Lu, Jiaheng Zhang, Tianwei Zhang View a PDF of the paper titled Hijacking Large Audio-Language Models via Context-Agnostic and Imperceptible Auditory Prompt Injection, by Meng Chen and 4 other authors View PDF HTML (experimental) Abstract:Modern Large audio-language models (LALMs) power intelligent voice interactions by tightly integrating audio and text. This integration, however, expands the attack surface beyond text and introduces vulnerabilities in the continuous, high-dimensional audio channel.
…
Excerpt limited to ~120 words for fair-use compliance. The full article is at arXiv.org.