Retrieval-Augmented Large Language Models for Schema-Constrained Clinical Information Extraction
The paper discusses a novel approach to extracting structured clinical information from nurse-patient conversations using retrieval-augmented large language models. It highlights the challenges of documentation in healthcare and proposes a modular pipeline that improves extraction accuracy. The results indicate that schema-constrained prompting and second-pass auditing enhance performance significantly.
- ▪The study focuses on extracting observations from conversational nurse-patient transcripts.
- ▪A modular retrieval-augmented generation pipeline is proposed to normalize narratives into a predefined schema.
- ▪The best configuration achieved an F1 score of 80.36% using GPT-5.2 with full schema and second-pass auditing.
Opening excerpt (first ~120 words) tap to expand
Computer Science > Computation and Language arXiv:2605.15467 (cs) [Submitted on 14 May 2026] Title:Retrieval-Augmented Large Language Models for Schema-Constrained Clinical Information Extraction Authors:A H M Rezaul Karim, Ozlem Uzuner View a PDF of the paper titled Retrieval-Augmented Large Language Models for Schema-Constrained Clinical Information Extraction, by A H M Rezaul Karim and 1 other authors View PDF HTML (experimental) Abstract:Conversational nurse-patient transcripts contain actionable observations, but converting these transcripts into structured representations at scale remains challenging. Documentation burden is substantial, with prior studies showing clinicians spend large portions of their workday on documentation and related desk work rather than direct patient care.
…
Excerpt limited to ~120 words for fair-use compliance. The full article is at arXiv cs.AI.