Gemma 4 wrote three summaries in one response. The middle one was a self-disclaimer.
Gemma 4 E2B has shown the ability to produce three sequential outputs in a single response, including a meeting summary and a self-disclaimer. Recent testing revealed that the model's performance varies significantly based on input context length. The findings challenge previous claims about the model's calibration and highlight the importance of rigorous testing.
- ▪At num_ctx=2048, Gemma 4 E2B produces a hedged output, while at num_ctx=32768, it provides confident summaries without hedging.
- ▪The model's ability to distinguish between syntactic and semantic damage was tested through various input scenarios.
- ▪Initial claims about the model's calibration were found to be inaccurate after further testing and feedback from engineers.
Opening excerpt (first ~120 words) tap to expand
try { if(localStorage) { let currentUser = localStorage.getItem('current_user'); if (currentUser) { currentUser = JSON.parse(currentUser); if (currentUser.id === 3923429) { document.getElementById('article-show-container').classList.add('current-user-is-article-author'); } } } } catch (e) { console.error(e); } thehwang Posted on May 20 Gemma 4 wrote three summaries in one response. The middle one was a self-disclaimer. #gemma #llm #ollama #ablation The short version, in case the title was being coy: at num_ctx=2048, Gemma 4 E2B produces three sequential outputs in a single response — a mostly-hallucinated meeting summary, a Note: saying that summary isn't actually in the transcript, then a more careful retry. Three runs at temperature=0.0, identical pattern every time.
…
Excerpt limited to ~120 words for fair-use compliance. The full article is at DEV.to (Top).