Search: "mamba" — WeSearch Press

5 stories match your query across our 700+ source catalog. Ranked by relevance and recency.

5 results for "mamba"

AdaMamba: Adaptive Frequency-Gated Mamba for Long-Term Time Series Forecasting

Accurate long-term time series forecasting (LTSF) requires the capture of complex long-range dependencies and dynamic periodic patterns. Recent advances in frequency-domain analysis offer a global per…

Tue, 28 Apr 2026 04:13:21 GMT · 5 views

MACHINE LEARNING

Going from 3B/7B dense to Nemotron 3 Nano (hybrid Mamba-MoE) for multi-task reasoning — what changes in the fine-tuning playbook? [D]

Following up on something I posted a few days back about fine-tuning for multi-task reasoning. Read a lot since then, and I've moved past the dense 3B vs 7B question — landing on Nemotron 3 Nano (the …

Sun, 26 Apr 2026 16:10:10 GMT · 10 views

ARXIV.ORG

Prism: Demystifying Retention and Interaction in Mid-Training

We present PRISM, a comprehensive empirical study of mid-training design choices for large language models. Through controlled experiments across seven base models spanning four families (Granite, LLa…

Thu, 30 Apr 2026 17:09:44 GMT · 4 views

ARXIV CS.AI

The Randomness Floor: Measuring Intrinsic Non-Randomness in Language Model Token Distributions

Language models cannot be random. This paper introduces Entropic Deviation (ED), the normalised KL divergence between a model's token distribution and the uniform distribution, and measures it systema…

Wed, 29 Apr 2026 04:04:25 GMT · 6 views

VERCEL

Disaggregated Serving for Hybrid SSM Models in vLLM

Hybrid architectures that interleave Mamba-style SSM layers with standard full-attention (FA) layers — such as NVIDIA Nemotron-H — are gaining traction as a way…

Tue, 28 Apr 2026 20:46:24 GMT · 5 views

Or browse by topic

World US Politics Technology AI Markets Business Science Climate Health Culture Media

Results for "mamba".