AVBench: Human-Aligned and Automated Evaluation Benchmark for Audio-Video Generative Models
AVBench is a newly introduced benchmark aimed at improving the evaluation of audio-video generative models, particularly in human-centric scenarios. It incorporates fine-grained metrics and specialized evaluators to provide a more accurate assessment of model capabilities. This automated evaluation tool is designed to align closely with human judgment and enhance the quality of AV generation.
- ▪AVBench integrates ten evaluation dimensions focused on human-centered real-world scenarios, including visual and audio quality.
- ▪The benchmark utilizes a probabilistic scoring mechanism to derive continuous evaluation scores from model predictions.
- ▪AVBench aims to address the limitations of existing coarse-grained benchmarks and improve the accuracy of assessments in AV generation.
Opening excerpt (first ~120 words) tap to expand
Computer Science > Artificial Intelligence arXiv:2605.24652 (cs) [Submitted on 23 May 2026] Title:AVBench: Human-Aligned and Automated Evaluation Benchmark for Audio-Video Generative Models Authors:Jialiang Yang, Bin Xia, Ruihang Chu, Dingdong Wang, Wanke Xia, Zhun Mou, Tianyang Zhong, Yiting Zhao, Wenming Yang View a PDF of the paper titled AVBench: Human-Aligned and Automated Evaluation Benchmark for Audio-Video Generative Models, by Jialiang Yang and Bin Xia and Ruihang Chu and Dingdong Wang and Wanke Xia and Zhun Mou and Tianyang Zhong and Yiting Zhao and Wenming Yang View PDF HTML (experimental) Abstract:Rapid advances in audio-video (AV) generation have enabled high-fidelity synthesis with synchronized sound, particularly for human-related scenarios involving speech and…
Excerpt limited to ~120 words for fair-use compliance. The full article is at arXiv cs.AI.