CBT-Audio: Evaluating Audio Language Models for Patient-Side Distress Intensity Estimation in CBT Session Recordings
The article discusses the development of CBT-Audio, a dataset designed to evaluate patient distress estimation from audio recordings of cognitive behavioral therapy (CBT) sessions. It highlights the limitations of existing AI systems that primarily focus on text, emphasizing the importance of vocal delivery in understanding patient distress. The findings indicate that combining audio with transcripts significantly enhances distress estimation accuracy in various audio language models.
- ▪CBT-Audio contains 1,802 patient turns from 96 publicly available CBT recordings.
- ▪The dataset includes turn-level distress labels validated by experts.
- ▪Results show that audio provides valuable information beyond text, especially when combined with transcripts.
Opening excerpt (first ~120 words) tap to expand
Computer Science > Artificial Intelligence arXiv:2605.17370 (cs) [Submitted on 17 May 2026] Title:CBT-Audio: Evaluating Audio Language Models for Patient-Side Distress Intensity Estimation in CBT Session Recordings Authors:Qixuan Hu, Shuchang Ye, Xumou Zhang, Anastasia Serafimovska, Anastasia Suraev, Amit Saha, Ping-hsiu Lin, Sydney Su, Usman Naseem, Adam G. Dunn, Jinman Kim View a PDF of the paper titled CBT-Audio: Evaluating Audio Language Models for Patient-Side Distress Intensity Estimation in CBT Session Recordings, by Qixuan Hu and 10 other authors View PDF Abstract:Cognitive behavioural therapy is widely used to help patients understand and manage psychological distress.
…
Excerpt limited to ~120 words for fair-use compliance. The full article is at arXiv cs.AI.