Safety Geometry Collapse in Multimodal LLMs and Adaptive Drift Correction
The paper discusses the safety challenges faced by multimodal large language models (MLLMs) in transferring safety capabilities across different input modalities. It introduces the concept of Safety Geometry Collapse, which occurs when models struggle to refuse harmful inputs due to modality-induced drift. The authors propose a method called ReGap to correct this drift and enhance the safety of MLLMs without sacrificing their general capabilities.
- ▪Multimodal large language models often fail to transfer safety capabilities learned in text to non-text inputs.
- ▪The study identifies a failure mode termed Safety Geometry Collapse, which compresses usable separation along refusal directions.
- ▪The proposed ReGap method adaptively corrects modality drift and improves multimodal safety during inference.
Opening excerpt (first ~120 words) tap to expand
Computer Science > Artificial Intelligence arXiv:2605.18104 (cs) [Submitted on 18 May 2026] Title:Safety Geometry Collapse in Multimodal LLMs and Adaptive Drift Correction Authors:Jiahe Guo, Xiangran Guo, Jiaxuan Chen, Weixiang Zhao, Yanyan Zhao, Yutai Hou, Qianchao Wang, Dandan Tu, Bing Qin View a PDF of the paper titled Safety Geometry Collapse in Multimodal LLMs and Adaptive Drift Correction, by Jiahe Guo and 8 other authors View PDF HTML (experimental) Abstract:Multimodal large language models (MLLMs) often fail to transfer safety capabilities learned in the text modality to semantically equivalent non-text inputs, revealing a persistent multimodal safety gap.
…
Excerpt limited to ~120 words for fair-use compliance. The full article is at arXiv cs.AI.