LLM-eval-kit: Distributed LLM evaluation framework (v0.3.0)
llm-eval-kit is a modular, open-source framework for evaluating large language models across eight orthogonal dimensions, providing explainable and actionable feedback to improve response quality through iterative refinement. It supports self-refinement loops, multi-model evaluation, and customizable plugins, aiming to go beyond single-score metrics. The tool includes a command-line interface, visualization options, and integration with major LLM providers.
Opening excerpt (first ~120 words) tap to expand
🚀 Just launched! If you find this useful, give it a star — it's the only metric that helps me justify spending more time on it. 🔬 llm-eval-kit The modular, explainable, self-refining evaluation framework for LLMs. Score on 8 orthogonal axes. Loop until quality plateaus. Get a human-readable verdict. 📖 Documentation · 🚀 Quickstart · 🧩 Examples · 🗺️ Roadmap · 🤝 Contributing A single number tells you nothing. llm-eval-kit tells you why a response is good or bad — and how to fix it.
…
Excerpt limited to ~120 words for fair-use compliance. The full article is at GitHub.