Capturing LLM Capabilities via Evidence-Calibrated Query Clustering

May 19, 2026 · 4:00 AM UTC ·2 min read · 0 reactions · 0 comments · 27 views

#artificial intelligence #machine learning #query clustering

TL;DR · WeSearch summary

The paper presents a new algorithm called ECC for clustering queries based on their latent capability demands. This method aims to improve the evaluation of large language models (LLMs) by aligning surface-level semantics with actual model performance. Extensive evaluations show that ECC significantly enhances capability ranking quality compared to existing methods.

Key facts

▪ECC calibrates prior semantic embeddings using limited posterior model comparisons.
▪The algorithm characterizes each cluster through a capability profile parameterized by a Bradley-Terry model.
▪ECC outperforms human-labeled and embedding-based baselines by an average of 17.64 and 18.02 percentage points, respectively.

Original article

arXiv cs.AI

Read full at arXiv cs.AI →

Opening excerpt (first ~120 words) tap to expand

Computer Science > Artificial Intelligence arXiv:2605.17110 (cs) [Submitted on 16 May 2026] Title:Capturing LLM Capabilities via Evidence-Calibrated Query Clustering Authors:Fangzhou Wu, Sandeep Silwal, Qiuyi Zhang View a PDF of the paper titled Capturing LLM Capabilities via Evidence-Calibrated Query Clustering, by Fangzhou Wu and 2 other authors View PDF Abstract:Query clustering organizes queries into groups that reflect shared latent capability demands, enabling capability-aware LLM evaluation. Existing clustering methods, which primarily rely on semantic taxonomies or embeddings, often fail to capture such latent capability requirements due to a misalignment between surface-level semantics and actual model performance.

…

Excerpt limited to ~120 words for fair-use compliance. The full article is at arXiv cs.AI.

Anonymous · no account needed

Discussion

0 comments

Capturing LLM Capabilities via Evidence-Calibrated Query Clustering

Discussion

More from arXiv cs.AI