2 results for "preference learning"
ARXIV.ORG
Aligning with Your Own Voice: Self-Corrected Preference Learning for Hallucination Mitigation in LVLMs
Large Vision-Language Models (LVLMs) frequently suffer from hallucinations. Existing preference learning-based approaches largely rely on proprietary models to construct preference datasets. We identi…
ARXIV.ORG
Explanation Quality Assessment as Ranking with Listwise Rewards
We reformulate explanation quality assessment as a ranking problem rather than a generation problem. Instead of optimizing models to produce a single "best" explanation token-by-token, we train reward…