WeSearch
Hub / Tags / Transformer
TAG · #TRANSFORMER

Transformer coverage.

Every story in the WeSearch catalog tagged with #transformer, chronological, with view counts. Subscribe to the per-tag RSS feed to follow this topic in your reader of choice.

56 stories tagged with #transformer, in publish-time order across the WeSearch catalog. Tag pages update as new stories ingest.

⌘ RSS feed for this tag →   or   search "Transformer"

RELATED TAGS
#transformers15#ai14#ml13#transformer-models4#entertainment2#hasbro2#machinelearning2#anniversary2#model-serving1#state-space-models1#distributed-systems1#llm-architecture1
GIZMODO

Two Years Later, We’re Finally Learning How a Transformers-Inspired Rover Fared on the Moon

SORA-Q showed that tiny robots could do big things on the Moon.…

11 views ·
ARXIV CS.AI

Evaluating Transformer and LSTM Frameworks for Prediction in Ungauged Basins

Watershed networks exhibit convergent topologies in which multiple tributaries merge into downstream channels,integrating diverse upstream hydrological processes. In ungauged basin…

15 views ·
#artificial intelligence#machine learning#hydrology
ARXIV CS.AI

RelGT-AC: A Relational Graph Transformer for Autocomplete Tasks in Relational Databases

Relational databases underpin modern enterprise, scientific, and healthcare systems, yet predictive machine learning on such data remains challenging due to their multi-table, hete…

17 views ·
#artificial intelligence#machine learning#databases
KDNUGGETS

Practical NLP in the Browser with Transformers.js

This tutorial covers three NLP tasks: text classification, zero-shot labelling, and question answering using Transformers.js's pipeline() API.…

23 views ·
#nlp#javascript#transformers
R/STABLEDIFFUSION

PrismML just released Binary and Ternary Bonsai Image 4B: 1-bit/ternary text-to-image diffusion transformers that can even run 100% locally in your browser on WebGPU.

23 views ·
ALEKSAGORDIC

The Transformer: The Life of a Token

A deep dive into a modern dense transformer: YaRN, hybrid attention, soft capping, QK normalization, FLOPs/token, cluster sizing, and more.…

13 views ·
#technology#artificial intelligence#machine learning
YAHOO FINANCE

Transformers Foundation Outlines Why Many Traceability Tools Fail to Meet the Standards of New Regulation

13 views ·
DEV.TO (TOP)

Transformer as an Incomplete Cognitive Architecture: What It Captures Well and What It Misses (A11 Perspective)

Since its introduction, the transformer architecture has become the cornerstone of modern artificial...…

14 views ·
#ai#architecture#machinelearning
R/SINGULARITY

One of the authors of "Attention is All You Need" just argued we should move past it. Pathway’s Post-Transformer debate is worth watching

24 views ·
R/SINGULARITY

One of the authors of "Attention is All You Need" just argued we should move past it. Pathway’s Post-Transformer debate is worth watching

11 views ·
THE HINDU — TOP

Ahead of monsoon, CESC sets up transformer banks across taluks

CESC establishes transformer banks across taluks to ensure uninterrupted power supply during the monsoon season in Mysuru and surrounding districts.…

16 views ·
#electricity#monsoon#infrastructure
TOWARDS DATA SCIENCE

From TF-IDF to Transformers: Implementing Four Generations of Semantic Search

How did semantic search evolve from simple keyword matching into modern transformer-based language understanding? This hands-on article builds four generations of semantic search s…

14 views ·
#ai#machine learning#semantic search
ARXIV CS.AI

Tensor Cache: Eviction-conditioned Associative Memory for Transformers

Autoregressive Transformer KV caches grow linearly with context length; sliding-window caching bounds memory but discards evicted tokens entirely, so relevant evidence outside the …

16 views ·
#machine learning#artificial intelligence#transformers
ARXIV CS.AI

Every Component is a Lookup: Token Attribution and Composition from a Single Decomposition

Mechanistic interpretability of transformers requires identifying not just which components matter but how they compose into the computational route that produced a prediction. Bot…

15 views ·
#machine learning#artificial intelligence#transformers
XDA DEVELOPERS

Most gamers aren't actually using DLSS 4.5's new transformer model — here's why and how to fix it

I wouldn't go with Nvidia's recommended defaults.…

13 views ·
#gaming#technology#nvidia
R/STABLEDIFFUSION

SEGA: Spectral-Energy Guided Attention for Resolution Extrapolation in Diffusion Transformers

18 views ·
THE HINDU — TOP

Transformer failure halts Rapid Metro, triggers blackout in half-a-dozen Gurugram sectors

Transformer failure disrupts Rapid Metro services in Gurugram, causing widespread power outages for nearly 45 minutes.…

15 views ·
#transportation#power outage#gurugram
TIMES OF INDIA — TOP

Gurugram hit by major power outage after transformer blaze disrupts supply

NEW DELHI: Power outage hit Gurugram after the main transformer at the 220 KVA power station in Sector-72 caught fire, disrupting electricity supply across several parts of the cit…

12 views ·
#gurugram#power outage#infrastructure
ARXIV.ORG

Coda: Rewriting Transformer Blocks as GEMM-Epilogue Programs

Transformer training systems are built around dense linear algebra, yet a nontrivial fraction of end-to-end time is spent on surrounding memory-bound operators. Normalization, acti…

11 views ·
#machine learning#transformers#gpu
ARXIV CS.AI

Plug-and-Play Spiking Operators: Breaking the Nonlinearity Bottleneck in Spiking Transformers

ANN-to-SNN conversion offers a practical, training-free route to spiking large language models. However, current pipelines primarily focus on spike-driven realizations for Transfor…

15 views ·
#machine learning#artificial intelligence#neuroscience
ARXIV CS.AI

Weight Decay Regimes in Grokking Transformers: Cheap Online Diagnostics

Transformers trained on modular arithmetic exhibit sharp transitions between memorization, generalization, and collapse. We show that weight decay acts as a scalar empirical contro…

15 views ·
#machine learning#artificial intelligence#neural computing
ARXIV CS.AI

Rethinking Cross-Layer Information Routing in Diffusion Transformers

Diffusion Transformers (DiTs) have become a de facto backbone of modern visual generation, and nearly every major axis of their design -- tokenization, attention, conditioning, obj…

17 views ·
#computer vision#artificial intelligence#transformers
INVESTING.COM — NEWS

Enphase Energy stock surges on data center transformer opportunity

19 views ·
LIVE SCIENCE

China's real-life 'transformer' mech is a giant humanoid robot that can switch from bounding on 4 legs to walking on 2

The new 'mecha' robot, which weighs over 1,000 pounds and stands nearly 10 foot tall, is designed for urban mobility.…

14 views ·
#robotics#technology#innovation
UBER

Scaling Real-Time Traffic Forecasting with a Graph-Aware Transformer

Learn how Uber deployed a deep transformer model with graph data pipelines to solve real-time traffic forecasting, improving route quality and arrival times for millions of custome…

13 views ·
#technology#transportation#artificial intelligence
ARXIV.ORG

WorldParticle: Unified World Simulation of Lagrangian Particles via Transformer

A unified simulator that can model diverse physical phenomena without solver-specific redesign is a long-standing goal across simulation science. We present a learning-based partic…

15 views ·
#computer science#graphics#machine learning
ARXIV CS.AI

Position: The Turing-Completeness of Real-World Autoregressive Transformers Relies Heavily on Context Management

Many works make the eye-catching claim that Transformers are Turing-complete. However, the literature often conflates two distinct settings: (i) a fixed Transformer system setting,…

12 views ·
#artificial intelligence#machine learning#transformers
ARXIV CS.AI

Robust Basis Spline Decoupling for the Compression of Transformer Models

Decoupling is a powerful modeling paradigm for representing multivariate functions as compositions of linear transformations and univariate nonlinear functions. A single-layer deco…

13 views ·
#machine learning#artificial intelligence#neural networks
ARXIV CS.AI

Simply Stabilizing the Loop via Fully Looped Transformer

Scaling model performance typically requires increasing model size. Looped Transformer offers a compelling alternative by iteratively reusing the same Transformer blocks, trading a…

13 views ·
#machine learning#artificial intelligence#transformer models
ARXIV CS.AI

Block-Based Double Decoders

Encoder-decoder models offer substantial inference-time savings over decoder-only models, but their pretraining objectives suffer from sparse supervision and dynamic sequence lengt…

20 views ·
#machine learning#artificial intelligence#transformer models
ARXIV CS.AI

Emergence of Frontier Superposition: M\"obius attractor and Cascade Supervision

Superposition allows Transformers to reason in depth, carrying an entire reasoning frontier in parallel through a bounded-depth forward pass instead of unrolling serial chain-of-th…

14 views ·
#machine learning#artificial intelligence#transformers
ARXIV CS.AI

Precision Tracked Transformer via Kalman Filtering, Kriging and Process Noise

The Transformer is the foundational building block of modern AI, yet offers no principled handling of \emph{uncertainty}, which is prevalent in real applications: cold-start tokens…

17 views ·
#machine learning#artificial intelligence#transformers
ARXIV CS.AI

Transformers Linearly Represent Highly Structured World Models

Do transformers, when trained on sequential reasoning traces, build internal models of the underlying task? And if so, does the structure of those internal representations mirror t…

13 views ·
#machine learning#artificial intelligence#transformers
ARXIV CS.AI

Exact Linear Attention

This paper introduces Exact Linear Attention (ELA), a mechanism that achieves linear computational complexity for Transformer attention by leveraging the exact decomposition proper…

14 views ·
#machine learning#artificial intelligence#transformer models
ARXIV CS.AI

From Sparsity to Simplicity: Enabling Simpler Sequential Replacements via Sparse Attention Distillation

Self-attention serves as the core foundation of large-scale transformer pretraining, but its quadratic token interaction cost makes inference expensive. Replacing attention with si…

11 views ·
#machine learning#artificial intelligence#transformers
DEV.TO (TOP)

I Tested KTransformers on My Laptop — 5 Hidden Features That Made 671B Models Actually Work 🔥

In May 2026, a GitHub project with 17,179 stars quietly achieved what cloud providers spend millions...…

13 views ·
#artificial intelligence#technology#software
DEV.TO (TOP)

KTransformers 的5个隐藏用法:671B模型在一台机器上跑出286 tokens/s 🔥

2026年5月,一个GitHub上仅有17,179颗星的开源项目,做到了各大云厂商砸了数百万美元才勉强做到的事情:在一台机器上以286...…

13 views ·
#technology#artificial intelligence#machine learning
GIZMODO

This Week Feels Like Christmas for Fans of ‘Transformers: The Movie’

'The Apology Tour' for the classic 1986 animated film continues with a few re-releases.…

13 views ·
#transformers#movies#anniversary
BILLBOARD

Hasbro Is Celebrating 40 Years of ‘The Transformers: The Movie’ With ‘Reformatted’ Soundtrack — And Yes, Stan Bush Is Back

'Transformers: The Movie' at 40: New soundtrack taps Stan Bush, Sebastian Bach and more.…

16 views ·
#entertainment#music#anniversary
R/MACHINELEARNING

Need reliable source for 30+ years of S&P 500 historical data for LSTM/Transformer research [P]

18 views ·
DEV.TO (TOP)

[Day 7] Does Giving an AI More 'Thinking Time' Really Make It Smarter? Training an OpenMythos-Style Mini Model on DGX

Day 7 of my 100-experiment local LLM challenge. Trained a tiny OpenMythos-style mini model (theoretical reconstruction of the rumored Claude Mythos architecture) on multi-digit add…

18 views ·
#ai#machinelearning#transformers
R/MOVIES

Official 40th Anniversary Poster for ‘The Transformers: The Movie’ Returning to Theaters September 17

84 views ·
HUGGING FACE BLOG

PaddleOCR 3.5: Running OCR and Document Parsing Tasks with a Transformers Backend

A Blog post by PaddlePaddle on Hugging Face…

17 views ·
#ocr#transformers#document parsing
VARIETY

‘Transformers’ Attraction to Launch in Brazil This Year as Hasbro Expands Global Experiences Biz (EXCLUSIVE)

A "Transformers" attraction will open in Brazil later this year, marking the latest live experience from toy giant Hasbro and a huge push into LATAM.…

12 views ·
#entertainment#hasbro#transformers
GIST

Usual implementation of attention transformers (SDPA) is kind of bad, actually

The usual implementaiton of attention transformers (SDPA) is kind of bad, actually - antisdpa.md…

14 views ·
#artificial intelligence#machine learning#technology
ARXIV CS.AI

MR2-ByteTrack: CNN and Transformer-based Video Object Detection for AI-augmented Embedded Vision Sensor Nodes

Modern smart vision sensors need on-device intelligence to process video streams, as cloud computing is often impractical due to bandwidth, latency, and privacy constraints. Howeve…

15 views ·
#computer vision#artificial intelligence#embedded systems
ARXIV CS.AI

Grokking as Structural Inference: Transformers Need Bayesian Lottery Tickets

Why does a Transformer that has memorized its training set wait thousands of steps before it generalizes? Existing accounts locate this delay in norm minimization, feature emergenc…

16 views ·
#machine learning#artificial intelligence#transformers
DEV.TO (TOP)

Taming the Spike: Predicting Glucose Peaks 30 Minutes Ahead with Transformers and TensorFlow 🩸🚀

Managing blood glucose is like trying to drive a car where the steering wheel has a 20-minute lag....…

12 views ·
#health#technology#machinelearning
MEDIUM

Autoregressive next token prediction and KV Cache in transformers

Understand the optimization technique in LLMs to speed up token generation…

14 views ·
#technology#artificial intelligence#machine learning
R/STABLEDIFFUSION

What is transformer architecture?

16 views ·
R/MACHINELEARNING

Made and Published a Paper Comparing Analysis of CNN and Vision Transformer Architectures for Brain Tumor Detection [R]

16 views ·
HACKER NEWS (AI / LLM)

Recent Developments in LLM Architectures: KV Sharing, MHC, Compressed Attention

From Gemma 4 to DeepSeek V4, How New Open-Weight LLMs Are Reducing Long-Context Costs…

12 views ·
#llm architecture#attention mechanisms#memory efficiency
HINDUSTAN TIMES — TOP

Madras HC orders CBI probe into alleged money laundering in procuring transformers

The bench directed the State Directorate of Vigilance and Anti-Corruption (DVAC) that had been probing the matter until now, to “hand over” all papers and records related to the ca…

12 views ·
#corruption#investigation#tenders
THE HINDU — TOP

Madras High Court orders CBI probe into ₹397-crore transformer procurement during Senthilbalaji’s tenure

Madras High Court orders CBI investigation into ₹397 crore transformer procurement scam during V. Senthilbalaji's tenure as Electricity Minister.…

14 views ·
#corruption#investigation#politics
VERCEL

Disaggregated Serving for Hybrid SSM Models in vLLM

Hybrid architectures that interleave Mamba-style SSM layers with standard full-attention (FA) layers — such as NVIDIA Nemotron-H — are gaining traction as a way…

10 views ·
#machine learning#model serving#state-space models
ALL NEWS

Astor Enerji shares rise 3% on $51.5M transformer deal

12 views ·