WeSearch
Hub / Search / transformer models
SEARCH · TRANSFORMER MODELS

Results for "transformer models".

9 stories match your query across our 700+ source catalog. Ranked by relevance and recency.

9 results for "transformer models"

ARXIV CS.AI

The Spectral Lifecycle of Transformer Training: Transient Compression Waves, Persistent Spectral Gradients, and the Q/K--V Asymmetry

We present the first systematic study of weight matrix singular value spectra \emph{during} transformer pretraining, tracking full SVD decompositions of every weight matrix at 25-step intervals across…

· 4 views
ARXIV CS.AI

BiTA: Bidirectional Gated Recurrent Unit-Transformer Aggregator in a Temporal Graph Network Framework for Alert Prediction in Computer Networks

Proactive alert prediction in computer networks is critical for mitigating evolving cyber threats and enabling timely defensive actions. Temporal Graph Neural Networks (TGNs) provide a principled fram…

· 4 views
ROBBYANT 蚂蚁灵波科技

LingBot-Map: Streaming 3D reconstruction with geometric context transformer

Technology-driven and application-oriented. We build foundational large models for embodied AI: spatial perception (LingBot-Depth), VLA (LingBot-VLA), world models (LingBot-World), video action (LingB…

· 5 views
ARXIV CS.AI

The Randomness Floor: Measuring Intrinsic Non-Randomness in Language Model Token Distributions

Language models cannot be random. This paper introduces Entropic Deviation (ED), the normalised KL divergence between a model's token distribution and the uniform distribution, and measures it systema…

· 4 views
ARXIV CS.AI

Stochastic KV Routing: Enabling Adaptive Depth-Wise Cache Sharing

Serving transformer language models with high throughput requires caching Key-Values (KVs) to avoid redundant computation during autoregressive generation. The memory footprint of KV caching is signif…

· 4 views
ARXIV CS.AI

Applied AI-Enhanced RF Interference Rejection

AI-enhanced interference rejection in radio frequency (RF) transmissions has recently attracted interest because deep learning approaches trained on both the signal of interest (SOI) and the signal mi…

· 4 views
ARXIV CS.AI

MAE-Based Self-Supervised Pretraining for Data-Efficient Medical Image Segmentation Using nnFormer

Transformer architectures, including nnFormer,have demonstrated promising results in volumetric medical image segmentation by being able to capture long-range spatial interactions. Although they have …

· 4 views
STABLEDIFFUSION

Ernie VS Qwen and ZiT - Big Test

A large test of 100 images in a gallery Big image generator showdown: 100 prompts, 3 models, 1 winner. This comparison brings together three open image models with very different strengths. ERNIE-Imag…

· 6 views
ARXIV.ORG

Beyond the Attention Stability Boundary: Agentic Self-Synthesizing Reasoning Protocols

As LLM agents transition to autonomous digital coworkers, maintaining deterministic goal-directedness in non-linear multi-turn conversations emerged as an architectural bottleneck. We identify and for…

· 4 views