WeSearch

Skills Without Evals Are Just Markdown and Hope

·13 min read · 0 reactions · 0 comments · 1 view
Skills Without Evals Are Just Markdown and Hope

TL;DR. I built an Anthropic Agent Skill for @ngrx/signals and ran it through the full eval pipeline: capability A/B benchmarks, token and wall-time accounting, and a description-optimizer loop. The skill lifts pass rate from 84% to 100%. It also adds 14 seconds and ~12,000 tokens per invocation (about $0.04 at Sonnet 4.6 input pricing). The description optimizer ran for three iterations and never beat the description I started with. And my evals are now saturated at 100%, which is itself a probl

Original article
DEV Community
Read full at DEV Community →
Opening excerpt (first ~120 words) tap to expand

try { if(localStorage) { let currentUser = localStorage.getItem('current_user'); if (currentUser) { currentUser = JSON.parse(currentUser); if (currentUser.id === 1395106) { document.getElementById('article-show-container').classList.add('current-user-is-article-author'); } } } } catch (e) { console.error(e); } Daniel Sogl Posted on May 1 Skills Without Evals Are Just Markdown and Hope #claude #ai #angular #ngrx TL;DR. I built an Anthropic Agent Skill for @ngrx/signals and ran it through the full eval pipeline: capability A/B benchmarks, token and wall-time accounting, and a description-optimizer loop. The skill lifts pass rate from 84% to 100%. It also adds 14 seconds and ~12,000 tokens per invocation (about $0.04 at Sonnet 4.6 input pricing).

Excerpt limited to ~120 words for fair-use compliance. The full article is at DEV Community.

Anonymous · no account needed
Share 𝕏 Facebook Reddit LinkedIn Threads WhatsApp Bluesky Mastodon Email

Discussion

0 comments

More from DEV Community