#language-model — Tagged Stories

Every story in the WeSearch catalog tagged with #language-model, chronological, with view counts. Subscribe to the per-tag RSS feed to follow this topic in your reader of choice.

60 stories tagged with #language-model, in publish-time order across the WeSearch catalog. Tag pages update as new stories ingest.

⌘ RSS feed for this tag → or search "Language Model"

RELATED TAGS

#language-models124 #ai123 #ml76 #technology6 #large-language-models6 #reinforcement-learning4 #research4 #ai-research3 #computation3 #anthropic2 #document-editing2 #reasoning2

ARXIV.ORG

GPU Forecasters: Language Models as Selective Surrogates for Kernel Optimization

GPU kernels are the workhorse of modern deep learning, and optimizing them (via evolutionary search or coding agents) usually requires repeated measurement on target hardware. Whil…

11 views · Wed, 03 Jun 2026 05:11:55 GMT

#machine learning #artificial intelligence #gpu optimization

ARXIV.ORG

Benchmarking LLM-as-a-Judge for Long-Form Output Evaluation

As large language models (LLMs) are increasingly used for long-form generation, reliably evaluating long-form outputs has become a critical challenge. LLM-as-a-judge offers a scala…

Language Model coverage.

GPU Forecasters: Language Models as Selective Surrogates for Kernel Optimization

Benchmarking LLM-as-a-Judge for Long-Form Output Evaluation

Visual Graph Scaffolds for Structural Reasoning in Large Language Models

ChatHealthAI: Aligning Electronic Health Record Representations with Large Language Models for Grounded Clinical Reasoning

SkillDAG: Self-Evolving Typed Skill Graphs for LLM Skill Selection at Scale

The Shadow Price of Reasoning: Economic Perspective on Optimal Budget Allocation for LLMs

Decomposing how prompting steers behavior

Uncertainty-Aware Clarification in LLM Agents with Information Gain

ClinicalMC: A Benchmark for Multi-Course Clinical Decision-Making with Large Language Models

From Answers to States: Verifiable Process-Level Evaluation of Chemical Reasoning in Large Language Models

Code-on-Graph: Iterative Programmatic Reasoning via Large Language Models on Knowledge Graphs

Why Are Large Language Models So Terrible at Video Games?

Parallax: Parameterized Local Linear Attention for Language Modeling

Scaling Laws for Agent Harnesses via Effective Feedback Compute

Heuristic Parasites: A Behavioral Taxonomy of Recurrent Distortion Patterns in Large Language Models (Full System) V2

AI Propaganda factories with language models

✨📊 🧠 The Ultimate Visual Guide to Large Language Models (LLMs)

📄Paper: RORA-VLM: Robust Retrieval Augmentation for Vision Language Models

LLMs believe false statements even after explicit warnings that they're false

Why does AI love writing about lighthouse keepers?

How sure is the activation oracle?

Can LLMs Introspect? A Reality Check

Personalizing Embodied Multimodal Large Language Model Agents over Long-term User Interactions

OmniToM: Benchmarking Theory of Mind in LLMs via Explicit Belief Modeling

Reasoning, Code, or Both? How Large Language Models Handle Variations in Math Questions

The MiniMax-M2 Series: Mini Activations Unleashing Max Real-World Intelligence

MedGuideX: Internalizing Decision Logic from Executable Guidelines into Large Language Models for Clinical Reasoning

AGORA: Adapter-Grounded Observation-Action Retention for Inference-Free Prompt Compression in LLM Agents

Towards Feedback-to-Plan Decisions for Self-Evolving LLM Agents in CUDA Kernel Generation

The Attribution Blind Spot: Detecting When Language Models Rely on Memory Rather Than Retrieved Context

What Makes Chain-of-Thought Work at Probe Time? Local Co-occurrence Rather Than Global Derivation

Multi-Stakeholder LLM Alignment: Decomposing Estimation from Aggregation

Generating Robust Portfolios of Optimization Models using Large Language Models

Counteraction-Aware Multi-Teacher On-Policy Distillation for General Capability Recovery with Domain Preservation

Detecting Is Not Resolving: The Monitoring Control Gap in Retrieval Augmented LLMs

Gumbel Machine: Counterfactual Student Writing Generation via Gumbel Noise Steering

Pretraining Data Exposure in Large Language Models: A Survey of Membership Inference, Data Contamination, and Security Implications

PitchBench: Measuring Pitch Hearing in Audio-Language Models

Microsoft Research: LLMs Corrupt your files during delegated work

Sparse Autoencoders Reveal Cortical Brain-LLM Semantic Mapping

Prompt Politeness Affects LLM Accuracy

You don't need all the LLM benchmarks

In Search of the Ingredients of Open-Endedness: Replicating Picbreeder with Large Vision-Language Models

Confidence Calibration in Large Language Models

How Much Thinking is Enough? Quantifying and Understanding Redundancy in LLM Reasoning

Why We Need World Models for AGI: Where LLMs Fail and How World Models May Outperform

LC-ERD: Mining Latent Logic for Self-Evolving Reasoning via Consistency-Regulated Reward Decomposition

Breaking the Chains of Probability: Neutrosophic Logic as a New Framework for Epistemic Uncertainty in Large Language Models

HyperGuide: Hyperbolic Guidance for Efficient Multi-Step Reasoning in Large Language Models

Inference Time Context Sparsity: Illusion or Opportunity?

Distilling Game Code World Model Generation into Lightweight Large Language Models

Understanding and Mitigating Premature Confidence for Better LLM Reasoning

Hypothesis Generation and Inductive Inference in Children and Language Models

PALoRA: Projection-Adaptive LoRA for Preserving Reasoning in Large Language Models

Jailbreak to Protect: Buffering and Reinforcing via Temporary Jailbreaking for Safe Fine-Tuning in Large Language Models

Summoning the Oracle to Slay It: Mitigating Look-Ahead Bias in Financial Backtesting with Large Language Models

Learning to Reason Efficiently with A* Post-Training

When Mean CE Fails: Median CE Can Better Track Language Model Quality

Emotional intelligence in large language models is fragmented across perception, cognition, and interaction

Automated Detection and Classification of Delusion-related Content in Naturalistic Audio Diaries Using Multi-Agent Language Models

Browse more