Search: "hybrid attention"

2 stories match your query across our 700+ source catalog. Ranked by relevance and recency.

2 results for "hybrid attention"

Going from 3B/7B dense to Nemotron 3 Nano (hybrid Mamba-MoE) for multi-task reasoning — what changes in the fine-tuning playbook? [D]

Following up on something I posted a few days back about fine-tuning for multi-task reasoning. Read a lot since then, and I've moved past the dense 3B vs 7B question — landing on Nemotron 3 Nano (the …

Sun, 26 Apr 2026 16:10:10 GMT · 5 views

FINAL-Bench/Darwin-36B-Opus · Hugging Face

Darwin-36B-Opus is a 36-billion-parameter mixture-of-experts (MoE) language model produced by the Darwin V7 evolutionary breeding engine from two publicly available parents: Father : Qwen/Qwen3.6-35B-…

Sun, 26 Apr 2026 17:11:53 GMT · 6 views

Or browse by topic

World US Politics Technology AI Markets Business Science Climate Health Culture Media

Results for "hybrid attention".

Going from 3B/7B dense to Nemotron 3 Nano (hybrid Mamba-MoE) for multi-task reasoning — what changes in the fine-tuning playbook? [D]

FINAL-Bench/Darwin-36B-Opus · Hugging Face

Or browse by topic