WeSearch
Hub / Search / quantization
SEARCH · QUANTIZATION

Results for "quantization".

5 stories match your query across our 700+ source catalog. Ranked by relevance and recency.

5 results for "quantization"

LOCALLLAMA

AMG GPUs are faster at pre filling

I did give same prompt same document to 1660ti running Gemma 4 e2b q4 coz of the small vram and another to and igpu running Gemma 4 e4b q8 prefill rate before token generation was like 4-5 times faste…

· 3 views
REDDIT

AMD Hipfire - a new inference engine optimized for AMD GPU's

Came across hipfire the other day. It's a brand new inference engine focused on all AMD GPU's (not just the latest). Github. It uses a special mq4 quantization method. The hipfire creator is pumping o…

· 6 views
REDDIT

Are Unsloth models as good as I read?

Has anybody done some comparing between the models that Unsloth offers and their counter part? For example: I've been using qwen3.6:35b-a3b Q4_K_M , and on my MBP 64GB I get around 39 t/s Using Unslot…

· 6 views
REDDIT

MagicQuant (v2.0) - Hybrid Mixed GGUF Models + New Unsloth Dynamic Learned Configs

MagicQuant v2.0 is here. Introducing hybrid GGUF mixed models, utilization of learned Unsloth Dynamic tensors, a new benchmark philosophy that skips the nonsense! Smaller files. Better KLD trades. Mag…

· 6 views
REDDIT

Higher precision or higher parameter count

I’m wondering if we take models of the same family (e.g qwen3.5 moes). And we compared ggufs that are of different core counts different quantizations but similar sizes. Which model would be better fo…

· 7 views