Diagnosing Knowledge Gaps in LLM Tool Use: An Agentic Benchmark for Novel API Acquisition

Jun 3, 2026 · 4:00 AM UTC ·3 min read · 0 reactions · 0 comments · 35 views

#artificial intelligence #machine learning #api #benchmarking

TL;DR · WeSearch summary

The paper introduces NovelAPIBench, a dynamic benchmark designed to evaluate large language models' ability to use novel APIs. It highlights the importance of both retrieval and parametric adaptation in enhancing model performance. The study finds that usage examples are crucial for effective learning, while adding more context can sometimes lead to errors.

Key facts

▪NovelAPIBench automates the discovery of novel APIs and generates executable coding tasks.
▪The research compares knowledge retrieval with parametric adaptation across various tasks and models.
▪Results indicate that usage examples are the most effective standalone signal for model training.

Original article

arXiv cs.AI

Read full at arXiv cs.AI →

Opening excerpt (first ~120 words) tap to expand

Computer Science > Artificial Intelligence arXiv:2606.03657 (cs) [Submitted on 2 Jun 2026] Title:Diagnosing Knowledge Gaps in LLM Tool Use: An Agentic Benchmark for Novel API Acquisition Authors:Jinnuo Liu, Yue Peng, Jinhan Niu, Hongyi Wen View a PDF of the paper titled Diagnosing Knowledge Gaps in LLM Tool Use: An Agentic Benchmark for Novel API Acquisition, by Jinnuo Liu and 3 other authors View PDF HTML (experimental) Abstract:Large language models for code generation often need to use APIs that are absent from their pretraining data. This requires more than recalling a function name: models must coordinate signatures, module paths, input-output contracts, semantics, and executable usage patterns.

…

Excerpt limited to ~120 words for fair-use compliance. The full article is at arXiv cs.AI.

Anonymous · no account needed

Discussion

0 comments

Diagnosing Knowledge Gaps in LLM Tool Use: An Agentic Benchmark for Novel API Acquisition

Discussion

More from arXiv cs.AI