2 results for "diagnostic errors"
ARXIV.ORG
CT-FineBench: A Diagnostic Fidelity Benchmark for Fine-Grained Evaluation of CT Report Generation
The evaluation of generated reports remains a critical challenge in Computed Tomography (CT) report generation, due to the large volume of text, the diversity and complexity of findings, and the prese…
ARXIV CS.AI
DO-Bench: An Attributable Benchmark for Diagnosing Object Hallucination in Vision-Language Models
Object level hallucination remains a central reliability challenge for vision language models (VLMs), particularly in binary object existence verification. Existing benchmarks emphasize aggregate accu…