56 stories tagged with #transformer, in publish-time order across the WeSearch catalog. Tag pages update as new stories ingest.
⌘ RSS feed for this tag → or search "Transformer"
Two Years Later, We’re Finally Learning How a Transformers-Inspired Rover Fared on the Moon
SORA-Q showed that tiny robots could do big things on the Moon.…
Evaluating Transformer and LSTM Frameworks for Prediction in Ungauged Basins
Watershed networks exhibit convergent topologies in which multiple tributaries merge into downstream channels,integrating diverse upstream hydrological processes. In ungauged basin…
RelGT-AC: A Relational Graph Transformer for Autocomplete Tasks in Relational Databases
Relational databases underpin modern enterprise, scientific, and healthcare systems, yet predictive machine learning on such data remains challenging due to their multi-table, hete…
Practical NLP in the Browser with Transformers.js
This tutorial covers three NLP tasks: text classification, zero-shot labelling, and question answering using Transformers.js's pipeline() API.…
PrismML just released Binary and Ternary Bonsai Image 4B: 1-bit/ternary text-to-image diffusion transformers that can even run 100% locally in your browser on WebGPU.
The Transformer: The Life of a Token
A deep dive into a modern dense transformer: YaRN, hybrid attention, soft capping, QK normalization, FLOPs/token, cluster sizing, and more.…
Transformers Foundation Outlines Why Many Traceability Tools Fail to Meet the Standards of New Regulation
Transformer as an Incomplete Cognitive Architecture: What It Captures Well and What It Misses (A11 Perspective)
Since its introduction, the transformer architecture has become the cornerstone of modern artificial...…
One of the authors of "Attention is All You Need" just argued we should move past it. Pathway’s Post-Transformer debate is worth watching
One of the authors of "Attention is All You Need" just argued we should move past it. Pathway’s Post-Transformer debate is worth watching
Ahead of monsoon, CESC sets up transformer banks across taluks
CESC establishes transformer banks across taluks to ensure uninterrupted power supply during the monsoon season in Mysuru and surrounding districts.…
From TF-IDF to Transformers: Implementing Four Generations of Semantic Search
How did semantic search evolve from simple keyword matching into modern transformer-based language understanding? This hands-on article builds four generations of semantic search s…
Tensor Cache: Eviction-conditioned Associative Memory for Transformers
Autoregressive Transformer KV caches grow linearly with context length; sliding-window caching bounds memory but discards evicted tokens entirely, so relevant evidence outside the …
Every Component is a Lookup: Token Attribution and Composition from a Single Decomposition
Mechanistic interpretability of transformers requires identifying not just which components matter but how they compose into the computational route that produced a prediction. Bot…
Most gamers aren't actually using DLSS 4.5's new transformer model — here's why and how to fix it
I wouldn't go with Nvidia's recommended defaults.…
SEGA: Spectral-Energy Guided Attention for Resolution Extrapolation in Diffusion Transformers
Transformer failure halts Rapid Metro, triggers blackout in half-a-dozen Gurugram sectors
Transformer failure disrupts Rapid Metro services in Gurugram, causing widespread power outages for nearly 45 minutes.…
Gurugram hit by major power outage after transformer blaze disrupts supply
NEW DELHI: Power outage hit Gurugram after the main transformer at the 220 KVA power station in Sector-72 caught fire, disrupting electricity supply across several parts of the cit…
Coda: Rewriting Transformer Blocks as GEMM-Epilogue Programs
Transformer training systems are built around dense linear algebra, yet a nontrivial fraction of end-to-end time is spent on surrounding memory-bound operators. Normalization, acti…
Plug-and-Play Spiking Operators: Breaking the Nonlinearity Bottleneck in Spiking Transformers
ANN-to-SNN conversion offers a practical, training-free route to spiking large language models. However, current pipelines primarily focus on spike-driven realizations for Transfor…
Weight Decay Regimes in Grokking Transformers: Cheap Online Diagnostics
Transformers trained on modular arithmetic exhibit sharp transitions between memorization, generalization, and collapse. We show that weight decay acts as a scalar empirical contro…
Rethinking Cross-Layer Information Routing in Diffusion Transformers
Diffusion Transformers (DiTs) have become a de facto backbone of modern visual generation, and nearly every major axis of their design -- tokenization, attention, conditioning, obj…
Enphase Energy stock surges on data center transformer opportunity
China's real-life 'transformer' mech is a giant humanoid robot that can switch from bounding on 4 legs to walking on 2
The new 'mecha' robot, which weighs over 1,000 pounds and stands nearly 10 foot tall, is designed for urban mobility.…
Scaling Real-Time Traffic Forecasting with a Graph-Aware Transformer
Learn how Uber deployed a deep transformer model with graph data pipelines to solve real-time traffic forecasting, improving route quality and arrival times for millions of custome…
WorldParticle: Unified World Simulation of Lagrangian Particles via Transformer
A unified simulator that can model diverse physical phenomena without solver-specific redesign is a long-standing goal across simulation science. We present a learning-based partic…
Position: The Turing-Completeness of Real-World Autoregressive Transformers Relies Heavily on Context Management
Many works make the eye-catching claim that Transformers are Turing-complete. However, the literature often conflates two distinct settings: (i) a fixed Transformer system setting,…
Robust Basis Spline Decoupling for the Compression of Transformer Models
Decoupling is a powerful modeling paradigm for representing multivariate functions as compositions of linear transformations and univariate nonlinear functions. A single-layer deco…
Simply Stabilizing the Loop via Fully Looped Transformer
Scaling model performance typically requires increasing model size. Looped Transformer offers a compelling alternative by iteratively reusing the same Transformer blocks, trading a…
Block-Based Double Decoders
Encoder-decoder models offer substantial inference-time savings over decoder-only models, but their pretraining objectives suffer from sparse supervision and dynamic sequence lengt…
Emergence of Frontier Superposition: M\"obius attractor and Cascade Supervision
Superposition allows Transformers to reason in depth, carrying an entire reasoning frontier in parallel through a bounded-depth forward pass instead of unrolling serial chain-of-th…
Precision Tracked Transformer via Kalman Filtering, Kriging and Process Noise
The Transformer is the foundational building block of modern AI, yet offers no principled handling of \emph{uncertainty}, which is prevalent in real applications: cold-start tokens…
Transformers Linearly Represent Highly Structured World Models
Do transformers, when trained on sequential reasoning traces, build internal models of the underlying task? And if so, does the structure of those internal representations mirror t…
Exact Linear Attention
This paper introduces Exact Linear Attention (ELA), a mechanism that achieves linear computational complexity for Transformer attention by leveraging the exact decomposition proper…
From Sparsity to Simplicity: Enabling Simpler Sequential Replacements via Sparse Attention Distillation
Self-attention serves as the core foundation of large-scale transformer pretraining, but its quadratic token interaction cost makes inference expensive. Replacing attention with si…
I Tested KTransformers on My Laptop — 5 Hidden Features That Made 671B Models Actually Work 🔥
In May 2026, a GitHub project with 17,179 stars quietly achieved what cloud providers spend millions...…
KTransformers 的5个隐藏用法:671B模型在一台机器上跑出286 tokens/s 🔥
2026年5月,一个GitHub上仅有17,179颗星的开源项目,做到了各大云厂商砸了数百万美元才勉强做到的事情:在一台机器上以286...…
This Week Feels Like Christmas for Fans of ‘Transformers: The Movie’
'The Apology Tour' for the classic 1986 animated film continues with a few re-releases.…
Hasbro Is Celebrating 40 Years of ‘The Transformers: The Movie’ With ‘Reformatted’ Soundtrack — And Yes, Stan Bush Is Back
'Transformers: The Movie' at 40: New soundtrack taps Stan Bush, Sebastian Bach and more.…
Need reliable source for 30+ years of S&P 500 historical data for LSTM/Transformer research [P]
[Day 7] Does Giving an AI More 'Thinking Time' Really Make It Smarter? Training an OpenMythos-Style Mini Model on DGX
Day 7 of my 100-experiment local LLM challenge. Trained a tiny OpenMythos-style mini model (theoretical reconstruction of the rumored Claude Mythos architecture) on multi-digit add…
Official 40th Anniversary Poster for ‘The Transformers: The Movie’ Returning to Theaters September 17
PaddleOCR 3.5: Running OCR and Document Parsing Tasks with a Transformers Backend
A Blog post by PaddlePaddle on Hugging Face…
‘Transformers’ Attraction to Launch in Brazil This Year as Hasbro Expands Global Experiences Biz (EXCLUSIVE)
A "Transformers" attraction will open in Brazil later this year, marking the latest live experience from toy giant Hasbro and a huge push into LATAM.…
Usual implementation of attention transformers (SDPA) is kind of bad, actually
The usual implementaiton of attention transformers (SDPA) is kind of bad, actually - antisdpa.md…
MR2-ByteTrack: CNN and Transformer-based Video Object Detection for AI-augmented Embedded Vision Sensor Nodes
Modern smart vision sensors need on-device intelligence to process video streams, as cloud computing is often impractical due to bandwidth, latency, and privacy constraints. Howeve…
Grokking as Structural Inference: Transformers Need Bayesian Lottery Tickets
Why does a Transformer that has memorized its training set wait thousands of steps before it generalizes? Existing accounts locate this delay in norm minimization, feature emergenc…
Taming the Spike: Predicting Glucose Peaks 30 Minutes Ahead with Transformers and TensorFlow 🩸🚀
Managing blood glucose is like trying to drive a car where the steering wheel has a 20-minute lag....…
Autoregressive next token prediction and KV Cache in transformers
Understand the optimization technique in LLMs to speed up token generation…
What is transformer architecture?
Made and Published a Paper Comparing Analysis of CNN and Vision Transformer Architectures for Brain Tumor Detection [R]
Recent Developments in LLM Architectures: KV Sharing, MHC, Compressed Attention
From Gemma 4 to DeepSeek V4, How New Open-Weight LLMs Are Reducing Long-Context Costs…
Madras HC orders CBI probe into alleged money laundering in procuring transformers
The bench directed the State Directorate of Vigilance and Anti-Corruption (DVAC) that had been probing the matter until now, to “hand over” all papers and records related to the ca…
Madras High Court orders CBI probe into ₹397-crore transformer procurement during Senthilbalaji’s tenure
Madras High Court orders CBI investigation into ₹397 crore transformer procurement scam during V. Senthilbalaji's tenure as Electricity Minister.…
Disaggregated Serving for Hybrid SSM Models in vLLM
Hybrid architectures that interleave Mamba-style SSM layers with standard full-attention (FA) layers — such as NVIDIA Nemotron-H — are gaining traction as a way…