WeSearch

VulkanForge – 14 MB Vulkan LLM engine that runs native FP8 models on AMD (Rust)

·13 min read · 0 reactions · 0 comments · 4 views
#machine learning#gpu computing#rust#vulkan#llm inference#VulkanForge#AMD#RDNA 4#gfx1201#oldnordic#ROCmForge#Meta-Llama-3.1-8B-Instruct-FP8#neuralmagic
VulkanForge – 14 MB Vulkan LLM engine that runs native FP8 models on AMD (Rust)
⚡ TL;DR · AI summary

VulkanForge is a Vulkan-based LLM inference engine written in Rust, designed for AMD RDNA 4 (gfx1201) GPUs, supporting native FP8 model execution. It achieves high performance with efficient VRAM usage, enabling 14B-class models on 16 GiB GPUs. The engine supports end-to-end FP8 inference, including FP8 KV caching, and outperforms competing Vulkan implementations in decode and prefill tasks.

Key facts
Original article
GitHub
Read full at GitHub →
Opening excerpt (first ~120 words) tap to expand

VulkanForge A Vulkan-based LLM inference engine in Rust, targeting AMD RDNA 4 (gfx1201). Compute-only — no swapchain, no graphics queues — built directly on ash 0.38 (Vulkan 1.3) rather than a higher-level wrapper. This project builds on the foundational work of oldnordic. Without his original ROCmForge implementation — the model loader, the CPU inference path, the GGUF parser, and the overall architecture — none of the WMMA matrix-core optimisations, the multi-model support, or the interactive chat CLI would have been possible. Thank you for making this project a reality. Status v0.3.4 — native FP8 LLM end-to-end, multi-submit prefill, Q3_K / Q5_K coopmat, 14B-class headroom on 16 GiB.

Excerpt limited to ~120 words for fair-use compliance. The full article is at GitHub.

Anonymous · no account needed
Share 𝕏 Facebook Reddit LinkedIn Threads WhatsApp Bluesky Mastodon Email

Discussion

0 comments

More from GitHub