WeSearch
Hub / Tags / Transformer Models
TAG · #TRANSFORMER-MODELS

Transformer Models coverage.

Every story in the WeSearch catalog tagged with #transformer-models, chronological, with view counts. Subscribe to the per-tag RSS feed to follow this topic in your reader of choice.

5 stories tagged with #transformer-models, in publish-time order across the WeSearch catalog. Tag pages update as new stories ingest.

⌘ RSS feed for this tag →   or   search "Transformer Models"

RELATED TAGS
#ml4#ai3#model-serving1#state-space-models1#distributed-systems1
ARXIV CS.AI

Robust Basis Spline Decoupling for the Compression of Transformer Models

Decoupling is a powerful modeling paradigm for representing multivariate functions as compositions of linear transformations and univariate nonlinear functions. A single-layer deco…

14 views ·
#machine learning#artificial intelligence#neural networks
ARXIV CS.AI

Simply Stabilizing the Loop via Fully Looped Transformer

Scaling model performance typically requires increasing model size. Looped Transformer offers a compelling alternative by iteratively reusing the same Transformer blocks, trading a…

14 views ·
#machine learning#artificial intelligence
ARXIV CS.AI

Block-Based Double Decoders

Encoder-decoder models offer substantial inference-time savings over decoder-only models, but their pretraining objectives suffer from sparse supervision and dynamic sequence lengt…

21 views ·
#machine learning#artificial intelligence
ARXIV CS.AI

Exact Linear Attention

This paper introduces Exact Linear Attention (ELA), a mechanism that achieves linear computational complexity for Transformer attention by leveraging the exact decomposition proper…

15 views ·
#machine learning#artificial intelligence
VERCEL

Disaggregated Serving for Hybrid SSM Models in vLLM

Hybrid architectures that interleave Mamba-style SSM layers with standard full-attention (FA) layers — such as NVIDIA Nemotron-H — are gaining traction as a way…

11 views ·
#machine learning#model serving#state-space models