WeSearch
Hub / Search / ai errors
SEARCH · AI ERRORS

Results for "ai errors".

20 stories match your query across our 700+ source catalog. Ranked by relevance and recency.

20 results for "ai errors"

CLAUDEAI

Claude Status Update : Claude.ai unavailable and elevated errors on the API on 2026-04-28T18:33:55.000Z

This is an automatic post triggered within 2 minutes of an official Claude system status update. Incident: Claude.ai unavailable and elevated errors on the API Check on progress and whether or not the…

· 14 views
CLAUDEAI

Claude Status Update : Claude.ai unavailable and elevated errors on the API on 2026-04-28T17:51:36.000Z

This is an automatic post triggered within 2 minutes of an official Claude system status update. Incident: Claude.ai unavailable and elevated errors on the API Check on progress and whether or not the…

· 13 views
DEV.TO (TOP)

Pylon: Self-Host Your Own AI Agent Pipeline That Fixes Sentry Errors via

Pylon is a self-hosted daemon that triggers sandboxed Claude Code agents from webhooks (Sentry, cron, chat) and reports results with human approval —…

· 3 views
CLAUDEAI

Claude Status Update : Elevated errors on Claude Haiku 4.5 on 2026-04-28T12:38:38.000Z

· 4 views
CLAUDE

Claude.ai is unavailable

Claude's Status Page - Claude.ai unavailable and elevated errors on the API.…

· 12 views
ARXIV.ORG

CAP-CoT: Cycle Adversarial Prompt for Improving Chain of Thoughts in LLM Reasoning

Chain-of-Thought (CoT) prompting has emerged as a simple and effective way to elicit step-by-step solutions from large language models (LLMs). However, CoT reasoning can be unstable across runs on lon…

· 3 views
ARXIV.ORG

When Corrective Hints Hurt: Prompt Design in Reasoner-Guided Repair of LLM Overcaution on Entailed Negations under OWL~2~DL

We report a reproducible error pattern in GPT-5.4 on OWL~2~DL compliance queries: the model frequently answers ``unknown'' when the reasoner-entailed answer is ``no'' under \emph{FunctionalProperty} c…

· 4 views
ARXIV.ORG

FinGround: Detecting and Grounding Financial Hallucinations via Atomic Claim Verification

Financial AI systems must produce answers grounded in specific regulatory filings, yet current LLMs fabricate metrics, invent citations, and miscalculate derived quantities. These errors carry direct …

· 3 views
ARXIV.ORG

CT-FineBench: A Diagnostic Fidelity Benchmark for Fine-Grained Evaluation of CT Report Generation

The evaluation of generated reports remains a critical challenge in Computed Tomography (CT) report generation, due to the large volume of text, the diversity and complexity of findings, and the prese…

· 6 views
ARXIV.ORG

Credal Concept Bottleneck Models for Epistemic-Aleatoric Uncertainty Decomposition

Concept Bottleneck Models (CBMs) predict through human-interpretable concepts, but they typically output point concept probabilities that conflate epistemic uncertainty (reducible model underspecifica…

· 3 views
ARXIV.ORG

Agentic clinical reasoning over longitudinal myeloma records: a retrospective evaluation against expert consensus

Multiple myeloma is managed through sequential lines of therapy over years to decades, with each decision depending on cumulative disease history distributed across dozens to hundreds of heterogeneous…

· 3 views
ARXIV.ORG

OpenGame: Open Agentic Coding for Games

Game development sits at the intersection of creative design and intricate software engineering, demanding the joint orchestration of game engines, real-time loops, and tightly coupled state across ma…

· 2 views
ARXIV CS.AI

RADIANT-LLM: an Agentic Retrieval Augmented Generation Framework for Reliable Decision Support in Safety-Critical Nuclear Engineering

Reliable decision support in nuclear engineering requires traceable, domain-grounded knowledge retrieval, yet safety and risk analysis workflows remain hampered by fragmented documentation and halluci…

· 1 view
ARXIV CS.AI

When VLMs 'Fix' Students: Identifying and Penalizing Over-Correction in the Evaluation of Multi-line Handwritten Math OCR

Accurate transcription of handwritten mathematics is crucial for educational AI systems, yet current benchmarks fail to evaluate this capability properly. Most prior studies focus on single-line expre…

· 4 views
ARXIV CS.AI

Parameter Efficiency Is Not Memory Efficiency: Rethinking Fine-Tuning for On-Device LLM Adaptation

Parameter-Efficient Fine-Tuning (PEFT) has become the standard for adapting large language models (LLMs). In this work we challenge the wide-spread assumption that parameter efficiency equates memory …

· 2 views
ARXIV CS.AI

DO-Bench: An Attributable Benchmark for Diagnosing Object Hallucination in Vision-Language Models

Object level hallucination remains a central reliability challenge for vision language models (VLMs), particularly in binary object existence verification. Existing benchmarks emphasize aggregate accu…

· 3 views
ARXIV.ORG

LLMs Corrupt Your Documents When You Delegate

Large Language Models (LLMs) are poised to disrupt knowledge work, with the emergence of delegated work as a new interaction paradigm (e.g., vibe coding). Delegation requires trust - the expectation t…

· 5 views
ARXIV.ORG

A Systematic Approach for Large Language Models Debugging

Large language models (LLMs) have become central to modern AI workflows, powering applications from open-ended text generation to complex agent-based reasoning. However, debugging these models remains…

· 4 views
ARXIV.ORG

Discovering Agentic Safety Specifications from 1-Bit Danger Signals

Can large language model agents discover hidden safety objectives through experience alone? We introduce EPO-Safe (Experiential Prompt Optimization for Safe Agents), a framework where an LLM iterative…

· 3 views
ARXIV.ORG

Causal Discovery as Dialectical Aggregation: A Quantitative Argumentation Framework

Constraint-based causal discovery is brittle in finite-sample regimes because erroneous conditional-independence (CI) decisions can cascade into substantial structural errors. We propose Quantitative …

· 3 views