Transformers coverage.

16 views · Mon, 25 May 2026 04:07:35 GMT

Tensor Cache: Eviction-conditioned Associative Memory for Transformers

Autoregressive Transformer KV caches grow linearly with context length; sliding-window caching bounds memory but discards evicted tokens entirely, so relevant evidence outside the …

16 views · Mon, 25 May 2026 04:07:35 GMT

Every Component is a Lookup: Token Attribution and Composition from a Single Decomposition

Mechanistic interpretability of transformers requires identifying not just which components matter but how they compose into the computational route that produced a prediction. Bot…

19 views · Sat, 23 May 2026 08:07:28 GMT

R/STABLEDIFFUSION

SEGA: Spectral-Energy Guided Attention for Resolution Extrapolation in Diffusion Transformers

ARXIV.ORG

Coda: Rewriting Transformer Blocks as GEMM-Epilogue Programs

Transformer training systems are built around dense linear algebra, yet a nontrivial fraction of end-to-end time is spent on surrounding memory-bound operators. Normalization, acti…

11 views · Fri, 22 May 2026 05:02:00 GMT

#machine learning #gpu

15 views · Fri, 22 May 2026 04:02:00 GMT

Plug-and-Play Spiking Operators: Breaking the Nonlinearity Bottleneck in Spiking Transformers

ANN-to-SNN conversion offers a practical, training-free route to spiking large language models. However, current pipelines primarily focus on spike-driven realizations for Transfor…

#machine learning #artificial intelligence #neuroscience

16 views · Fri, 22 May 2026 04:02:00 GMT

Weight Decay Regimes in Grokking Transformers: Cheap Online Diagnostics

Transformers trained on modular arithmetic exhibit sharp transitions between memorization, generalization, and collapse. We show that weight decay acts as a scalar empirical contro…

#machine learning #artificial intelligence #neural computing

18 views · Fri, 22 May 2026 04:02:00 GMT

Rethinking Cross-Layer Information Routing in Diffusion Transformers

Diffusion Transformers (DiTs) have become a de facto backbone of modern visual generation, and nearly every major axis of their design -- tokenization, attention, conditioning, obj…

#computer vision #artificial intelligence

13 views · Wed, 20 May 2026 04:04:59 GMT

Position: The Turing-Completeness of Real-World Autoregressive Transformers Relies Heavily on Context Management

Many works make the eye-catching claim that Transformers are Turing-complete. However, the literature often conflates two distinct settings: (i) a fixed Transformer system setting,…

#artificial intelligence #machine learning

14 views · Wed, 20 May 2026 04:04:59 GMT

Emergence of Frontier Superposition: M\"obius attractor and Cascade Supervision

Superposition allows Transformers to reason in depth, carrying an entire reasoning frontier in parallel through a bounded-depth forward pass instead of unrolling serial chain-of-th…

17 views · Wed, 20 May 2026 04:04:59 GMT

Precision Tracked Transformer via Kalman Filtering, Kriging and Process Noise

The Transformer is the foundational building block of modern AI, yet offers no principled handling of \emph{uncertainty}, which is prevalent in real applications: cold-start tokens…

14 views · Wed, 20 May 2026 04:04:59 GMT

Transformers Linearly Represent Highly Structured World Models

Do transformers, when trained on sequential reasoning traces, build internal models of the underlying task? And if so, does the structure of those internal representations mirror t…

12 views · Wed, 20 May 2026 04:04:59 GMT

From Sparsity to Simplicity: Enabling Simpler Sequential Replacements via Sparse Attention Distillation

Self-attention serves as the core foundation of large-scale transformer pretraining, but its quadratic token interaction cost makes inference expensive. Replacing attention with si…

14 views · Wed, 20 May 2026 03:34:59 GMT

I Tested KTransformers on My Laptop — 5 Hidden Features That Made 671B Models Actually Work 🔥

In May 2026, a GitHub project with 17,179 stars quietly achieved what cloud providers spend millions...…

#artificial intelligence #technology #software

14 views · Wed, 20 May 2026 03:34:59 GMT

KTransformers 的5个隐藏用法：671B模型在一台机器上跑出286 tokens/s 🔥

2026年5月，一个GitHub上仅有17,179颗星的开源项目，做到了各大云厂商砸了数百万美元才勉强做到的事情：在一台机器上以286...…

#technology #artificial intelligence #machine learning

GIZMODO

This Week Feels Like Christmas for Fans of ‘Transformers: The Movie’

'The Apology Tour' for the classic 1986 animated film continues with a few re-releases.…

14 views · Tue, 19 May 2026 19:04:57 GMT

#movies #anniversary

BILLBOARD

Hasbro Is Celebrating 40 Years of ‘The Transformers: The Movie’ With ‘Reformatted’ Soundtrack — And Yes, Stan Bush Is Back

'Transformers: The Movie' at 40: New soundtrack taps Stan Bush, Sebastian Bach and more.…

17 views · Tue, 19 May 2026 12:04:57 GMT

#entertainment #music #anniversary

19 views · Tue, 19 May 2026 03:34:57 GMT

[Day 7] Does Giving an AI More 'Thinking Time' Really Make It Smarter? Training an OpenMythos-Style Mini Model on DGX

Day 7 of my 100-experiment local LLM challenge. Trained a tiny OpenMythos-style mini model (theoretical reconstruction of the rumored Claude Mythos architecture) on multi-digit add…

#ai #machinelearning

R/MOVIES

Official 40th Anniversary Poster for ‘The Transformers: The Movie’ Returning to Theaters September 17

84 views · Mon, 18 May 2026 17:05:00 GMT

HUGGING FACE BLOG

PaddleOCR 3.5: Running OCR and Document Parsing Tasks with a Transformers Backend

A Blog post by PaddlePaddle on Hugging Face…

19 views · Mon, 18 May 2026 15:14:56 GMT

#ocr #document parsing

VARIETY

‘Transformers’ Attraction to Launch in Brazil This Year as Hasbro Expands Global Experiences Biz (EXCLUSIVE)

A "Transformers" attraction will open in Brazil later this year, marking the latest live experience from toy giant Hasbro and a huge push into LATAM.…

12 views · Mon, 18 May 2026 14:04:56 GMT

#entertainment #hasbro

GIST

Usual implementation of attention transformers (SDPA) is kind of bad, actually

The usual implementaiton of attention transformers (SDPA) is kind of bad, actually - antisdpa.md…

15 views · Mon, 18 May 2026 04:34:54 GMT

#artificial intelligence #machine learning #technology

16 views · Mon, 18 May 2026 04:04:54 GMT

Grokking as Structural Inference: Transformers Need Bayesian Lottery Tickets

Why does a Transformer that has memorized its training set wait thousands of steps before it generalizes? Existing accounts locate this delay in norm minimization, feature emergenc…