Gemma 4 Soft Tokens: The Rise and Fall of 16x16 Words ⚡👀

May 24, 2026 · 10:54 PM UTC ·20 min read · 0 reactions · 0 comments · 13 views

#technology #artificial intelligence #machine learning

Gemma 4 Soft Tokens: The Rise and Fall of 16x16 Words ⚡👀

⚡ TL;DR · AI summary

Gemma 4 introduces significant advancements in vision capabilities compared to its predecessors. The model now utilizes 48x48 soft tokens for image processing, moving away from the previous 16x16 patch representation. This change enhances the integration of visual information within the model's architecture.

Key facts

▪Gemma 4 features native vision capabilities across all its variants.
▪The model processes images using 48x48 soft tokens, fundamentally changing visual information representation.
▪The introduction of the Vision Transformer (ViT) allowed transformers to be applied to images effectively.

Original article

DEV.to (Top)

Read full at DEV.to (Top) →

Opening excerpt (first ~120 words) tap to expand

try { if(localStorage) { let currentUser = localStorage.getItem('current_user'); if (currentUser) { currentUser = JSON.parse(currentUser); if (currentUser.id === 673619) { document.getElementById('article-show-container').classList.add('current-user-is-article-author'); } } } } catch (e) { console.error(e); } Youdiowei Eteimorde Posted on May 24 Gemma 4 Soft Tokens: The Rise and Fall of 16x16 Words ⚡👀 #devchallenge #gemmachallenge #gemma Gemma 4 Challenge: Write about Gemma 4 Submission This is a submission for the Gemma 4 Challenge: Write About Gemma 4 The road to vision capabilities in the Gemma family has been an interesting one. The first and second generations of Gemma models did not include native vision support.

…

Excerpt limited to ~120 words for fair-use compliance. The full article is at DEV.to (Top).

Anonymous · no account needed

Discussion

0 comments

Gemma 4 Soft Tokens: The Rise and Fall of 16x16 Words ⚡👀

Discussion

More from DEV.to (Top)