Search: "model scaling" — WeSearch Press

8 stories match your query across our 700+ source catalog. Ranked by relevance and recency.

8 results for "model scaling"

MetaEarth3D: Unlocking World-scale 3D Generation with Spatially Scalable Generative Modeling

Recent generative AI models have achieved remarkable breakthroughs in language and visual understanding. However, although these models can generate realistic visual content, their spatial scale remai…

Wed, 29 Apr 2026 04:04:25 GMT · 6 views

DEV.TO (TOP)

Why Data Quality is Becoming More Important Than Model Size in Modern AI Systems

For years, progress in artificial intelligence was closely tied to scaling laws, where increasing...…

Wed, 29 Apr 2026 05:04:25 GMT · 6 views

ARXIV.ORG

Estimating Black-Box LLM Parameter Counts via Factual Capacity

Closed-source frontier labs do not disclose parameter counts, and the standard alternative -- inference economics -- carries $2\times$+ uncertainty from hardware, batching, and serving-stack assumptio…

Thu, 30 Apr 2026 09:49:09 GMT · 5 views

ARXIV.ORG

Beyond 80/20: High-Entropy Minority Tokens Drive Effective RL for LLM Reasoning

Reinforcement Learning with Verifiable Rewards (RLVR) has emerged as a powerful approach to enhancing the reasoning capabilities of Large Language Models (LLMs), while its mechanisms are not yet well …

Thu, 30 Apr 2026 07:28:16 GMT · 9 views

ARXIV CS.AI

Can Current Agents Close the Discovery-to-Application Gap? A Case Study in Minecraft

Discovering causal regularities and applying them to build functional systems--the discovery-to-application loop--is a hallmark of general intelligence, yet evaluating this capacity has been hindered …

Wed, 29 Apr 2026 04:04:25 GMT · 5 views

ARXIV CS.AI

The Spectral Lifecycle of Transformer Training: Transient Compression Waves, Persistent Spectral Gradients, and the Q/K--V Asymmetry

We present the first systematic study of weight matrix singular value spectra \emph{during} transformer pretraining, tracking full SVD decompositions of every weight matrix at 25-step intervals across…

Wed, 29 Apr 2026 04:04:25 GMT · 6 views

ARXIV CS.AI

FreqFormer: Hierarchical Frequency-Domain Attention with Adaptive Spectral Routing for Long-Sequence Video Diffusion Transformers

Long-sequence video diffusion transformers hit a quadratic self-attention cost that dominates runtime and memory for very long token sequences. Most efficient attention methods use one approximation e…

Wed, 29 Apr 2026 04:04:25 GMT · 6 views

ARXIV.ORG

Don't Make the LLM Read the Graph: Make the Graph Think

We investigate whether explicit belief graphs improve LLM performance in cooperative multi-agent reasoning. Through 3,000+ controlled trials across four LLM families in the cooperative card game Hanab…

Tue, 28 Apr 2026 04:13:21 GMT · 5 views

Or browse by topic

World US Politics Technology AI Markets Business Science Climate Health Culture Media

Results for "model scaling".

MetaEarth3D: Unlocking World-scale 3D Generation with Spatially Scalable Generative Modeling

Why Data Quality is Becoming More Important Than Model Size in Modern AI Systems

Estimating Black-Box LLM Parameter Counts via Factual Capacity

Beyond 80/20: High-Entropy Minority Tokens Drive Effective RL for LLM Reasoning

Can Current Agents Close the Discovery-to-Application Gap? A Case Study in Minecraft

The Spectral Lifecycle of Transformer Training: Transient Compression Waves, Persistent Spectral Gradients, and the Q/K--V Asymmetry

FreqFormer: Hierarchical Frequency-Domain Attention with Adaptive Spectral Routing for Long-Sequence Video Diffusion Transformers

Don't Make the LLM Read the Graph: Make the Graph Think

Or browse by topic