Tell HN: Gemini 3.5 Flash breaks in stupid ways

May 22, 2026 · 12:12 AM UTC ·1 min read · 0 reactions · 0 comments · 24 views

via

Ycombinator

TL;DR · WeSearch summary

Users are experiencing issues with the Gemini 3.5 Flash model, specifically in its grading functionality. When a 'Grading criteria' text is added, the model tends to underperform by collapsing scores towards the center of the scale. This behavior raises concerns about the reliability of current state-of-the-art models despite their advancements in other areas.

Key facts

▪Gemini 3.5 Flash is giving incorrect scores for correct answers.
▪Adding a 'Grading criteria' text causes the model to underperform.
▪This issue was reproducible by another user on social media.

Original article

Ycombinator

Read full at Ycombinator →

Opening excerpt (first ~120 words) tap to expand

I thought I was going crazy, trying to use Gemini 3.5 Flash to rate some answers, but it kept giving 7 instead of 10 for correct answers.Apparently once you add a "Grading criteria" text, the model collapses into a "compressed toward the center of the scale" hallucination (or training set overfitting).Someone on X asked me to try to reproduce it, and I actually got it on the first try on their Gemini Chat: https://x.com/XCSme/status/2057613611959279988I am not sure what to make of this (or most SOTA) models. They got a lot smarter with coding and tool usage, but a lot dumber in other ways...

Excerpt limited to ~120 words for fair-use compliance. The full article is at Ycombinator.

Anonymous · no account needed

Discussion

0 comments

Tell HN: Gemini 3.5 Flash breaks in stupid ways

Discussion

More from Ycombinator