Running llama.cpp on Snapdragon Hexagon NPU seems promising

May 1, 2026 · 5:26 AM UTC · 0 reactions · 0 comments · 5 views

via

LocalLlama

I have an Oneplus 12 with Snapdragon 8 Gen 3. I followed the above README to cross-compile llama.cpp on Ubuntu and then copy to the Termux directory on the phone. It seems like llama.cpp's Hexagon backend is highly supported by Qualcomm with many PRs made by Qualcomm employees. I am getting 8t/s pp and 4.5t/s tg with gemma-3-12b-it-qat-Q4_0 and 20t/s pp 12.5t/s tg with gemma-3-4b-it-qat-Q4_0. Speed is about the same as using the SD8G3's CPU but it is not hot at all and the tg speed is good enoug

Original article

LocalLlama

Read full at LocalLlama →

Anonymous · no account needed

Discussion

0 comments

Running llama.cpp on Snapdragon Hexagon NPU seems promising

Discussion

More from LocalLlama