BiomedAP: A Vision-Informed Dual-Anchor Framework with Gated Cross-Modal Fusion for Robust Medical Vision-Language Adaptation
The article discusses a new framework called BiomedAP designed for medical vision-language adaptation. This framework addresses the challenges of prompt variations in biomedical vision-language models by employing a dual-anchor approach. Extensive experiments show that BiomedAP outperforms existing methods in terms of accuracy and robustness.
- ▪BiomedAP utilizes a vision-informed dual-anchor framework to enhance medical vision-language adaptation.
- ▪The framework incorporates Gated Cross-Modal Fusion to improve interaction between visual and textual modalities.
- ▪Experiments across 11 benchmarks indicate that BiomedAP achieves superior few-shot accuracy and robustness compared to baseline models.
Opening excerpt (first ~120 words) tap to expand
Computer Science > Computer Vision and Pattern Recognition arXiv:2605.15736 (cs) [Submitted on 15 May 2026] Title:BiomedAP: A Vision-Informed Dual-Anchor Framework with Gated Cross-Modal Fusion for Robust Medical Vision-Language Adaptation Authors:Huanyang Tong, Kai Liu, Fangjun Kuang, Huiling Chen View a PDF of the paper titled BiomedAP: A Vision-Informed Dual-Anchor Framework with Gated Cross-Modal Fusion for Robust Medical Vision-Language Adaptation, by Huanyang Tong and Kai Liu and Fangjun Kuang and Huiling Chen View PDF HTML (experimental) Abstract:Biomedical Vision--Language Models (VLMs) have shown remarkable promise in few-shot medical diagnosis but face a critical bottleneck: \textit{fragility to prompt variations}.Existing adaptation frameworks typically optimize visual and…
Excerpt limited to ~120 words for fair-use compliance. The full article is at arXiv cs.AI.