WeSearch

First Gemma 4 ExecuTorch Deployment on Raspberry Pi 5 — and Why It's 7.7 Slower Than llama.cpp

·5 min read · 0 reactions · 0 comments · 12 views
#executorch#edgeai#raspberrypi#gemma4
First Gemma 4 ExecuTorch Deployment on Raspberry Pi 5 — and Why It's 7.7 Slower Than llama.cpp
⚡ TL;DR · AI summary

The first deployment of Gemma 4 ExecuTorch on a Raspberry Pi 5 has been documented, revealing significant performance differences compared to ARM's benchmarks. The deployment achieved bit-exact output but was found to be 7.7 times slower than llama.cpp. Various issues were encountered during the process, highlighting the challenges of deploying on non-SME2 hardware.

Key facts
Original article
DEV.to (Top)
Read full at DEV.to (Top) →
Opening excerpt (first ~120 words) tap to expand

try { if(localStorage) { let currentUser = localStorage.getItem('current_user'); if (currentUser) { currentUser = JSON.parse(currentUser); if (currentUser.id === 3950536) { document.getElementById('article-show-container').classList.add('current-user-is-article-author'); } } } } catch (e) { console.error(e); } Viik Posted on May 25 First Gemma 4 ExecuTorch Deployment on Raspberry Pi 5 — and Why It's 7.7 Slower Than llama.cpp #executorch #edgeai #raspberrypi #gemma4 On April 2, ARM published a blog post announcing Gemma 4 optimised for ARM devices via XNNPACK + KleidiAI, reporting 5.5× prefill speedup and 1.6× faster decode. Those numbers target Armv9 chips with SME2 — flagship phone silicon. I wanted to see what happens on the broader ARM ecosystem.

Excerpt limited to ~120 words for fair-use compliance. The full article is at DEV.to (Top).

Anonymous · no account needed
Share 𝕏 Facebook Reddit LinkedIn Threads WhatsApp Bluesky Mastodon Email

Discussion

0 comments

More from DEV.to (Top)