Ask HN: Local model experiences with 'high-reasoning distill' finetunes
The article discusses experiences with various finetunes on small models, particularly focusing on 'Opus-Reasoning' finetunes. The author notes that while these models may perform better on benchmarks, they often produce messy and buggy code in practical applications. The piece invites others to share their experiences and preferences regarding different finetunes.
- ▪The author has primarily worked with 'Opus-Reasoning' finetunes on qwen models.
- ▪Despite better benchmark performance, the models tend to produce overconfident and buggy code.
- ▪The author encourages others to share their experiences with different finetunes.
Opening excerpt (first ~120 words) tap to expand
What are your experiences with all the different variations of finetunes on small models (<40B) with those popular datasets? My personal experience is mostly with the 'Opus-Reasoning' ones on qwen models, and aside from the output being subjectively better looking (ascii charts and all), in actual coding performance every one I've tried tends to become a lot more overconfident, writing more messy and buggy code and tries to gaslight me that the task I give it is impossible when it cannot achieve it.I have seen them perform better on public benchmarks in some cases, which shouldn't be ignored completely, but that doesn't seem to translate to better output on real work in my limited testing.What are your observations? Any specific ones that you lean towards, or have had good experiences…
Excerpt limited to ~120 words for fair-use compliance. The full article is at Ycombinator.