15 results for "data quality"
MetaGAI: A Large-Scale and High-Quality Benchmark for Generative AI Model and Data Card Generation
The rapid proliferation of Generative AI necessitates rigorous documentation standards for transparency and governance. However, manual creation of Model and Data Cards is not scalable, while automate…
LabelSets — open quality standard for AI training data (LQS v3.1) [D]
Built a third-party quality rating system for ML datasets. Multi-oracle (7 scorers across 5 algorithm families), conformal prediction intervals on downstream F1, Ed25519-signed certs, and a contaminat…
Why Data Quality is Becoming More Important Than Model Size in Modern AI Systems
For years, progress in artificial intelligence was closely tied to scaling laws, where increasing...…
ParkingScenes: A Structured Dataset for End-to-End Autonomous Parking in Simulation Scenes
Autonomous parking remains a critical yet challenging task in intelligent driving systems, particularly within constrained urban environments where maneuvering space is limited and precise control is …
Explanation Quality Assessment as Ranking with Listwise Rewards
We reformulate explanation quality assessment as a ranking problem rather than a generation problem. Instead of optimizing models to produce a single "best" explanation token-by-token, we train reward…
HUD Says Realtors Can Now Speak the Truth
HUD: The U.S. Department of Housing and Urban Development (HUD) sent a “Dear Colleague” letter to real estate professionals clarifying they are not violating the Fair Housing Act when they share infor…
HUD Says It’s Legal to Tell the Truth
HUD: The U.S. Department of Housing and Urban Development (HUD) sent a “Dear Colleague” letter to real estate professionals clarifying they are not violating the Fair Housing Act when they share infor…
Do the "*Claude-4.6-Opus-Reasoning-Distilled" really bring something new to the original models?
No offense to the fine-tune model providers, just curious. IMO the original models were already trained on massive amount of high quality data, so why bother with this fine-tune? Just to make the mode…
An Intelligent Fault Diagnosis Method for General Aviation Aircraft Based on Multi-Fidelity Digital Twin and FMEA Knowledge Enhancement
Fault diagnosis of general aviation aircraft faces challenges including scarce real fault data, diverse fault types, and weak fault signatures. This paper proposes an intelligent fault diagnosis frame…
Judging the Judges: A Systematic Evaluation of Bias Mitigation Strategies in LLM-as-a-Judge Pipelines
LLM-as-a-Judge has become the dominant paradigm for evaluating language model outputs, yet LLM judges exhibit systematic biases that compromise evaluation reliability. We present a comprehensive empir…
SoccerRef-Agents: Multi-Agent System for Automated Soccer Refereeing
Refereeing is vital in sports, where fair, accurate, and explainable decisions are fundamental. While intelligent assistant technologies are being widely adopted in soccer refereeing, current AI-assis…
SemML 2.0: Synthesizing Controllers for LTL
Synthesizing a reactive system from specifications given in linear temporal logic (LTL) is a classical problem, finding its applications in safety-critical systems design. These systems are typically …
STELLAR-E: a Synthetic, Tailored, End-to-end LLM Application Rigorous Evaluator
The increasing reliance on Large Language Models (LLMs) across diverse sectors highlights the need for robust domain-specific and language-specific evaluation datasets; however, the collection of such…
A systematic evaluation of vision-language models for observational astronomical reasoning tasks
Vision-language models (VLMs) are increasingly proposed as general-purpose tools for scientific data interpretation, yet their reliability on real astronomical observations across diverse modalities r…
Microsoft TRELLIS.2: An Open-Source, 4B-Parameter, Image-to-3D Model [pdf]
Recent advancements in 3D generative modeling have significantly improved the generation realism, yet the field is still hampered by existing representations, which struggle to capture assets with com…