Qwen3.6-35B-A3B KLDs - INTs and NVFPs

Apr 25, 2026 · 10:41 PM UTC · 0 reactions · 0 comments · 7 views

KLD for INTs and NVFP4s. AS ALWAYS - Use Case is important. Accuracy versus speed versus native kernels on your GPUs. Things to note again: This is done in VLLM, with REAL logits. My Repo ( ) has made changes in the VLLM "hot path", so it's real, it's on GPU, and it's ~3-5 minutes on RTX 6000s KLD does not lie, it's just raw math against Logits KLD tells a story of divergence. Evals are still important, for use-case specific A quant can have a worse KLD and get a better eval on a test versus a b

Original article

Read full at Reddit →

Anonymous · no account needed

Discussion

0 comments

Qwen3.6-35B-A3B KLDs - INTs and NVFPs

Discussion

More from Reddit