WeSearch

Claude, GPT, Gemini Agents Fail 72% of U.S. Healthcare Workflows

·3 min read · 0 reactions · 0 comments · 16 views
#healthcare#artificial intelligence#technology
⚡ TL;DR · AI summary

A new benchmark study reveals that AI agents, including Claude, GPT, and Gemini, fail to complete 72% of U.S. healthcare workflows. The CHI-Bench study tested 30 AI agents across 75 clinical workflows, highlighting significant reliability issues. Despite claims of readiness for long workflows, the agents struggled with real clinical cases, raising concerns about their effectiveness in healthcare settings.

Key facts
Original article
AP News
Read full at AP News →
Opening excerpt (first ~120 words) tap to expand

Claude, GPT, Gemini Agents Fail 72% of U.S. Healthcare Workflows, New Benchmark Finds 1 of 2 | CHI-Bench Engine: Simulated Worlds for Clinical Healthcare In-Situ Workflows. Read More 2 of 2 | CHI-Bench results across agent harnesses and frontier. models Read More Claude, GPT, Gemini Agents Fail 72% of U.S. Healthcare Workflows, New Benchmark Finds 1 of 2 | CHI-Bench Engine: Simulated Worlds for Clinical Healthcare In-Situ Workflows. Read More 1 of 2 CHI-Bench Engine: Simulated Worlds for Clinical Healthcare In-Situ Workflows. Add AP News on Google Add AP News as your preferred source to see more of our stories on Google. Share Share Facebook Copy Link copied Print Email X LinkedIn Bluesky Flipboard Pinterest Reddit Read More 2 of 2 | CHI-Bench results across agent harnesses and frontier.

Excerpt limited to ~120 words for fair-use compliance. The full article is at AP News.

Anonymous · no account needed
Share 𝕏 Facebook Reddit LinkedIn Threads WhatsApp Bluesky Mastodon Email

Discussion

0 comments

More from AP News