9 results for "transformer models"
The Spectral Lifecycle of Transformer Training: Transient Compression Waves, Persistent Spectral Gradients, and the Q/K--V Asymmetry
We present the first systematic study of weight matrix singular value spectra \emph{during} transformer pretraining, tracking full SVD decompositions of every weight matrix at 25-step intervals across…
BiTA: Bidirectional Gated Recurrent Unit-Transformer Aggregator in a Temporal Graph Network Framework for Alert Prediction in Computer Networks
Proactive alert prediction in computer networks is critical for mitigating evolving cyber threats and enabling timely defensive actions. Temporal Graph Neural Networks (TGNs) provide a principled fram…
LingBot-Map: Streaming 3D reconstruction with geometric context transformer
Technology-driven and application-oriented. We build foundational large models for embodied AI: spatial perception (LingBot-Depth), VLA (LingBot-VLA), world models (LingBot-World), video action (LingB…
The Randomness Floor: Measuring Intrinsic Non-Randomness in Language Model Token Distributions
Language models cannot be random. This paper introduces Entropic Deviation (ED), the normalised KL divergence between a model's token distribution and the uniform distribution, and measures it systema…
Stochastic KV Routing: Enabling Adaptive Depth-Wise Cache Sharing
Serving transformer language models with high throughput requires caching Key-Values (KVs) to avoid redundant computation during autoregressive generation. The memory footprint of KV caching is signif…
Applied AI-Enhanced RF Interference Rejection
AI-enhanced interference rejection in radio frequency (RF) transmissions has recently attracted interest because deep learning approaches trained on both the signal of interest (SOI) and the signal mi…
MAE-Based Self-Supervised Pretraining for Data-Efficient Medical Image Segmentation Using nnFormer
Transformer architectures, including nnFormer,have demonstrated promising results in volumetric medical image segmentation by being able to capture long-range spatial interactions. Although they have …
Ernie VS Qwen and ZiT - Big Test
A large test of 100 images in a gallery Big image generator showdown: 100 prompts, 3 models, 1 winner. This comparison brings together three open image models with very different strengths. ERNIE-Imag…
Beyond the Attention Stability Boundary: Agentic Self-Synthesizing Reasoning Protocols
As LLM agents transition to autonomous digital coworkers, maintaining deterministic goal-directedness in non-linear multi-turn conversations emerged as an architectural bottleneck. We identify and for…