WeSearch
Hub / Search / model scaling
SEARCH · MODEL SCALING

Results for "model scaling".

8 stories match your query across our 700+ source catalog. Ranked by relevance and recency.

8 results for "model scaling"

ARXIV CS.AI

MetaEarth3D: Unlocking World-scale 3D Generation with Spatially Scalable Generative Modeling

Recent generative AI models have achieved remarkable breakthroughs in language and visual understanding. However, although these models can generate realistic visual content, their spatial scale remai…

· 6 views
DEV.TO (TOP)

Why Data Quality is Becoming More Important Than Model Size in Modern AI Systems

For years, progress in artificial intelligence was closely tied to scaling laws, where increasing...…

· 6 views
ARXIV.ORG

Estimating Black-Box LLM Parameter Counts via Factual Capacity

Closed-source frontier labs do not disclose parameter counts, and the standard alternative -- inference economics -- carries $2\times$+ uncertainty from hardware, batching, and serving-stack assumptio…

· 5 views
ARXIV.ORG

Beyond 80/20: High-Entropy Minority Tokens Drive Effective RL for LLM Reasoning

Reinforcement Learning with Verifiable Rewards (RLVR) has emerged as a powerful approach to enhancing the reasoning capabilities of Large Language Models (LLMs), while its mechanisms are not yet well …

· 9 views
ARXIV CS.AI

Can Current Agents Close the Discovery-to-Application Gap? A Case Study in Minecraft

Discovering causal regularities and applying them to build functional systems--the discovery-to-application loop--is a hallmark of general intelligence, yet evaluating this capacity has been hindered …

· 5 views
ARXIV CS.AI

The Spectral Lifecycle of Transformer Training: Transient Compression Waves, Persistent Spectral Gradients, and the Q/K--V Asymmetry

We present the first systematic study of weight matrix singular value spectra \emph{during} transformer pretraining, tracking full SVD decompositions of every weight matrix at 25-step intervals across…

· 6 views
ARXIV CS.AI

FreqFormer: Hierarchical Frequency-Domain Attention with Adaptive Spectral Routing for Long-Sequence Video Diffusion Transformers

Long-sequence video diffusion transformers hit a quadratic self-attention cost that dominates runtime and memory for very long token sequences. Most efficient attention methods use one approximation e…

· 6 views
ARXIV.ORG

Don't Make the LLM Read the Graph: Make the Graph Think

We investigate whether explicit belief graphs improve LLM performance in cooperative multi-agent reasoning. Through 3,000+ controlled trials across four LLM families in the cooperative card game Hanab…

· 5 views