WeSearch

Quant Qwen3.6-27B on 16GB VRAM with 100k context length

· 0 reactions · 0 comments · 6 views
Quant Qwen3.6-27B on 16GB VRAM with 100k context length

I have experimented how to run Qwen3.6-27B on my laptop with an A5000 16GB GPU. I have created an own IQ4_XS GGUF "qwen3.6-27b-IQ4_XS-pure.gguf" with the Unsloth imatrix and compared the mean KLD of it with other quants. You can see that I also have tested different turboquant versions. It looks that the buun-llama-cpp fork is better than the TheTom/llama-cpp-turboquant fork . If you want to try my version, you can do the following: Download my GGUF from Huggingface. It already contains an impro

Original article
LocalLlama
Read full at LocalLlama →
Anonymous · no account needed
Share 𝕏 Facebook Reddit LinkedIn Threads WhatsApp Bluesky Mastodon Email

Discussion

0 comments

More from LocalLlama