CoReVAD: A Contextual Reasoning Framework for Training-Free Video Anomaly Detection
The paper presents CoReVAD, a training-free framework for video anomaly detection that leverages a frozen Vision-Language Model. This approach aims to reduce the dependency on task-specific training and provides interpretable outputs. Experiments indicate that CoReVAD performs competitively while offering reliable explanations for detected anomalies.
- ▪CoReVAD operates with a single frozen Vision-Language Model to generate anomaly scores and temporal descriptions.
- ▪The framework introduces a Local Response Cleaning module to mitigate noise in generative outputs.
- ▪Experiments on UCF-Crime and XD-Violence datasets demonstrate CoReVAD's competitive performance among training-free methods.
Opening excerpt (first ~120 words) tap to expand
Computer Science > Computer Vision and Pattern Recognition arXiv:2605.23116 (cs) [Submitted on 22 May 2026] Title:CoReVAD: A Contextual Reasoning Framework for Training-Free Video Anomaly Detection Authors:Hyeongmuk Lim, Youngbum Hur View a PDF of the paper titled CoReVAD: A Contextual Reasoning Framework for Training-Free Video Anomaly Detection, by Hyeongmuk Lim and 1 other authors View PDF HTML (experimental) Abstract:Existing Video Anomaly Detection (VAD) methods typically rely on task-specific training, leading to strong domain dependency and high training costs. Moreover, most existing methods output only scalar anomaly scores, providing limited insight into why specific events are considered abnormal.
…
Excerpt limited to ~120 words for fair-use compliance. The full article is at arXiv cs.AI.