Quant Qwen3.6-27B on 16GB VRAM with 100k context length
·
0 reactions
·
0 comments
·
6 views
I have experimented how to run Qwen3.6-27B on my laptop with an A5000 16GB GPU. I have created an own IQ4_XS GGUF "qwen3.6-27b-IQ4_XS-pure.gguf" with the Unsloth imatrix and compared the mean KLD of it with other quants. You can see that I also have tested different turboquant versions. It looks that the buun-llama-cpp fork is better than the TheTom/llama-cpp-turboquant fork . If you want to try my version, you can do the following: Download my GGUF from Huggingface. It already contains an impro
Original article
LocalLlama
Anonymous · no account needed