8 results for "model scaling"
MetaEarth3D: Unlocking World-scale 3D Generation with Spatially Scalable Generative Modeling
Recent generative AI models have achieved remarkable breakthroughs in language and visual understanding. However, although these models can generate realistic visual content, their spatial scale remai…
Why Data Quality is Becoming More Important Than Model Size in Modern AI Systems
For years, progress in artificial intelligence was closely tied to scaling laws, where increasing...…
Estimating Black-Box LLM Parameter Counts via Factual Capacity
Closed-source frontier labs do not disclose parameter counts, and the standard alternative -- inference economics -- carries $2\times$+ uncertainty from hardware, batching, and serving-stack assumptio…
Beyond 80/20: High-Entropy Minority Tokens Drive Effective RL for LLM Reasoning
Reinforcement Learning with Verifiable Rewards (RLVR) has emerged as a powerful approach to enhancing the reasoning capabilities of Large Language Models (LLMs), while its mechanisms are not yet well …
Can Current Agents Close the Discovery-to-Application Gap? A Case Study in Minecraft
Discovering causal regularities and applying them to build functional systems--the discovery-to-application loop--is a hallmark of general intelligence, yet evaluating this capacity has been hindered …
The Spectral Lifecycle of Transformer Training: Transient Compression Waves, Persistent Spectral Gradients, and the Q/K--V Asymmetry
We present the first systematic study of weight matrix singular value spectra \emph{during} transformer pretraining, tracking full SVD decompositions of every weight matrix at 25-step intervals across…
FreqFormer: Hierarchical Frequency-Domain Attention with Adaptive Spectral Routing for Long-Sequence Video Diffusion Transformers
Long-sequence video diffusion transformers hit a quadratic self-attention cost that dominates runtime and memory for very long token sequences. Most efficient attention methods use one approximation e…
Don't Make the LLM Read the Graph: Make the Graph Think
We investigate whether explicit belief graphs improve LLM performance in cooperative multi-agent reasoning. Through 3,000+ controlled trials across four LLM families in the cooperative card game Hanab…