60 stories tagged with #reasoning, in publish-time order across the WeSearch catalog. Tag pages update as new stories ingest.
⌘ RSS feed for this tag → or search "Reasoning"
Visual Graph Scaffolds for Structural Reasoning in Large Language Models
Graphs have been used to enhance large language models (LLMs) for structured reasoning, mostly as external knowledge sources are provided to models at test time. In this paper, we …
ChatHealthAI: Aligning Electronic Health Record Representations with Large Language Models for Grounded Clinical Reasoning
Large language models (LLMs) exhibit strong natural-language reasoning abilities for clinical decision support, but struggle to effectively model structured longitudinal electronic…
Thinking Past the Answer: Evaluating Harmful Overthinking in Large Reasoning Models
Large Reasoning Models (LRMs) improve performance by generating explicit intermediate reasoning traces through increased test-time compute, yet the assumption that longer reasoning…
Inducing Reasoning Primitives from Agent Traces
ReAct-style LLM agents often rediscover the same reasoning routines across problems, yet leave those routines trapped in transient scratchpads. We introduce Reasoning Primitive Ind…
CORE: Conflict-Oriented Reasoning for General Multimodal Manipulation Detection
The rapid rise of generative AI has made multimodal fake news increasingly realistic and pervasive, posing severe threats to public trust and social stability. Existing detection m…
The Shadow Price of Reasoning: Economic Perspective on Optimal Budget Allocation for LLMs
Inference-time scaling has emerged as a critical avenue for enhancing Large Language Models' performance, yet real-world deployment is constrained by strict computational budgets. …
Perceive Before Reasoning: A Pre-Reasoning Perception Framework for Efficient and Reliable Proactive Mobile Agents
Multimodal large language models (MLLMs) have substantially advanced mobile agents, yet proactive mobile assistance remains challenging because agents must decide \emph{when} to in…
CP-Agent: Context-Aware Multimodal Reasoning for Cellular Morphological Profiling under Chemical Perturbations
Cell Painting combines multiplexed fluorescent staining, high-content imaging, and quantitative analysis to generate high-dimensional phenotypic readouts to support diverse downstr…
ThoughtFold: Folding Reasoning Chains via Introspective Preference Learning
Large Reasoning Models (LRMs) have achieved remarkable progress thanks to Reinforcement Learning with Verifiable Rewards (RLVR) on Chain-of-Thoughts (CoTs). However, since long CoT…
Bridging Auxiliary Constraints to Resolve Instruction Following in Large Reasoning Models
Large Reasoning Models (LRMs) have demonstrated impressive capabilities in many tasks, yet they struggle with reliably following multiple instructions, either by failing to satisfy…
TSQAgent: Rating Time Series Data Quality via Dedicated Agentic Reasoning
Assessing the quality of time series (TS) data is fundamental yet inherently challenging due to the multifaceted nature of quality dimensions. Recently, large language models (LLMs…
From Answers to States: Verifiable Process-Level Evaluation of Chemical Reasoning in Large Language Models
Large language models are increasingly used as chemistry assistants, yet most chemistry benchmarks still score only final answers. This masks a critical failure mode: a model may o…
Code-on-Graph: Iterative Programmatic Reasoning via Large Language Models on Knowledge Graphs
Knowledge Graphs (KGs) are widely used to mitigate the limitations of Large Language Models (LLMs), such as outdated knowledge and hallucinations. Existing LLM-KG integration frame…
Unveiling the Structure of Do-Calculus Reasoning via Derivation Graphs
The do-calculus defines a general system of inference for interventional queries, allowing causal quantities to be transformed through successive applications of its rules. This pr…
When to Re-Plan: Subgoal Persistence in Hierarchical Latent Reasoning
Long-horizon reasoning requires a system to commit to medium-horizon intent without becoming rigid: re-plan too often and computation never coheres into multi-step structure; commi…
Welcome NVIDIA Cosmos 3: The First Open Omni-model for Physical AI Reasoning and Action
A Blog post by NVIDIA on Hugging Face…
Claude Does Not Need More Prompts. It Needs Reasoning Discipline.
Large language models are good at sounding structured. That is not the same as being structured. Ask...…
The Evil of corporate America and their reasoning skills is that of people who enter a building to find the exit.
Gryphe/Pantheon-Reasoning-27B · Hugging Face
My Stepmom Won’t Come to My House to See My Baby. Her Reasoning Is Not Normal.
This cannot be healthy.…
Cassandra: Enabling Reasoning LLMs at Edge via Self-Speculative Decoding
Speculative decoding has emerged as a promising lossless approach for accelerating Large Language Models (LLMs). As reasoning LLMs increasingly suffer from decode-stage overhead an…
Fooling around with encrypted reasoning blobs
This is a quick post I wanted to write about a “hobby project” I spent a weekend on. It has little to do with real cryptography, and mostly doesn’t expose a particularly exciting ……
AutoTTS reduces token usage by 69.5% in LLM reasoning strategies
AutoTTS, a framework from Meta, Google, and university researchers, cuts LLM token usage by 69.5% while maintaining accuracy, with implications for AI-driven crypto tools.…
Researchers automated LLM reasoning strategy design and cut token usage by 69.5%
I gave my AI agents email instead of better reasoning. They started fixing each other's bugs.
What's the reasoning behind not letting us download our entire SMS messages as easy as possible?
Reasoning, Code, or Both? How Large Language Models Handle Variations in Math Questions
Large Language Models (LLMs) achieve impressive accuracy on mathematical reasoning benchmarks, yet their performance drops when problems are modified with simple changes like diffe…
Which Changes Matter? Towards Trustworthy Legal AI via Relevance-Sensitive Evaluation and Solver-Grounded Reasoning
Legal reasoning requires distinguishing changes that matter from those that do not. Legal AI should remain stable under legally irrelevant perturbations, but should change when per…
MedGuideX: Internalizing Decision Logic from Executable Guidelines into Large Language Models for Clinical Reasoning
Clinical practice guidelines (CPGs) encode evidence-based decision logic that clinicians apply by evaluating patient variables, conditional criteria, and recommendation rules. Howe…
Composition Collapse: Stable Factual Knowledge Does Not Imply Compositional Reasoning
Post-training is routinely evaluated through aggregate benchmark scores that treat multi-hop reasoning as a single capability -- as if a model that answers more questions correctly…
Traceable Knowledge Graph Reasoning Enables LLM-Assisted Decision Support for Industrial VOCs in the Steel Industry
Key knowledge for steel-industry volatile organic compounds (VOCs) governance is scattered across unstructured scientific literature, making it difficult to integrate process, poll…
Scaling, Benchmarking, and Reasoning of Vision-Language Agents for Mobile GUI Navigation
Vision-Language Models (VLMs) have shown rapid progress in mobile GUI navigation. This paper presents a systematic study of data scaling, benchmarking, and reasoning for VLM-based …
RepoMirage: Probing Repository Context Reasoning in Code Agents with Perturbations
Code agents are currently having skillful performance on repository-level software engineering benchmarks, but it remains unclear whether success on end-to-end tasks such as issue …
5.5/5.4 Reasoning Cheatsheet (YMMV)
Verbosity is not faithfulness: an architectural argument that reasoning models cannot perform faithful inference [D]
What Sudoku reveals about the limits of LLMs
LLM failure to solve reasoning puzzles exposes deep architectural limits…
Show HN: skills-for-humanity – 171 structured reasoning skills for Claude Code
Structured reasoning methodologies from history's most rigorous thinkers, packaged as Claude Code skills. - human-avatar/skills-for-humanity…
How Much Thinking is Enough? Quantifying and Understanding Redundancy in LLM Reasoning
Reasoning-capable large language models solve hard problems by emitting long chains of thought, paying heavily in latency, GPU time, and energy. Casual inspection of their traces r…
DRIVE: Modeling Skills at the Reasoning and Interaction Levels for Web Agents under Continual Learning
Web agents require both high-level reasoning (for task decomposition) and low-level interactions (for page elements manipulation) to conduct different tasks. However, these knowled…
Residual Drift Dominates Contradiction in Multi-Turn Constraint Reasoning
How do multi-turn reasoning systems fail? The expected answer is logical contradiction, in which the system's maintained state becomes unsatisfiable. We show that the dominant mode…
LGMT: Logic-Grounded Metamorphic Testing for Evaluating the Reasoning Reliability of LLMs
Large Language Models (LLMs) achieve strong performance on logical reasoning benchmarks, yet their reliability remains uncertain. Existing evaluations rely on static benchmarks, wh…
LC-ERD: Mining Latent Logic for Self-Evolving Reasoning via Consistency-Regulated Reward Decomposition
The evolution of Large Language Model (LLM) reasoning is bottlenecked by the scarcity of high-quality process data. While self-alignment via endogenous rewards offers a solution, m…
HyperGuide: Hyperbolic Guidance for Efficient Multi-Step Reasoning in Large Language Models
Multi-step reasoning remains a central challenge for large language models: single-pass generation is efficient but lacks accuracy; tree-search methods explore multiple paths but a…
Understanding and Mitigating Premature Confidence for Better LLM Reasoning
Long chains of thought (CoT) from current language models frequently contain logical gaps and unjustified leaps, limiting the gains from additional test-time compute. Improving rea…
SAM: State-Adaptive Memory for Long-Horizon Reasoning Agent
Long-horizon agentic reasoning requires large language models to act over long interaction histories containing thoughts, tool calls, observations, and partial conclusions. The cha…
AgentFugue: Agent Scaling for Long-Horizon Tasks through Collective Reasoning
Recent progress on long-horizon agentic tasks has been driven largely by scaling up individual agents through stronger models, better tools, and more effective scaffolding. In cont…
Reasoning as an Attack Surface: Adaptive Evolutionary CoT Jailbreaks for LLMs
Large Reasoning Models (LRMs) have demonstrated remarkable capabilities in reasoning and generation tasks and are increasingly deployed in real-world applications. However, their e…
PALoRA: Projection-Adaptive LoRA for Preserving Reasoning in Large Language Models
Efficiently updating Large Language Models (LLMs) with new or evolving factual knowledge remains a central challenge, as even parameter-efficient adaptation can erode previously ac…
GlobalDentBench: A Multinational Benchmark for Evaluating LLM Clinical Reasoning in Dentistry with Expert Calibration
While large language models (LLMs) hold transformative potential for medicine, their reasoning robustness and safety in real-world clinical scenarios remain critically underexplore…
Measuring Reasoning Quality in LLMs: A Multi-Dimensional Behavioral Framework
LLMs have achieved remarkable success in complex reasoning tasks, yet current evaluation approaches predominantly rely on final-answer correctness, offering limited insight into th…
Geo-Expert: Towards Expert-Level Geological Reasoning via Parameter-Efficient Fine-Tuning
While general-purpose Large Language Models (LLMs) applied to Geology often hallucinate when reasoning about subsurface structures and deep-time evolution, current AI in Earth scie…
Clustering as Reasoning: A $k$-Means Interpretation of Chain-of-Thought Graph Learning
Chain-of-Thought (CoT) prompting has shown promise in enhancing the reasoning capabilities of large language models (LLMs) on text-attributed graphs (TAGs). This work reframes CoT-…
Boosting Inference with Guided Reasoning: Stochastic Exploration for Recursive Models
Recent work on recursive architectures has shown that tiny neural networks can be surprisingly powerful on structured reasoning tasks. The trick is to model reasoning trajectories …
Context-CoT: Enhancing Context Learning via High-Quality Reasoning Synthesis
While LLMs excel at reasoning over prompts using static pretrained knowledge, they struggle significantly with context learning-the ability to dynamically extract, internalize, and…
Credit Assignment with Resets in Language Model Reasoning
Contemporary reinforcement learning with verifiable reward methods post-train language models on multi-step reasoning by assigning a single outcome reward uniformly across all toke…
Show HN: YourMemory, persistent memory layer with temporal reasoning for agents
Call for Papers - Workshop on Efficient Reasoning at COLM 2026 [R]
Show HN: Smriti: Shared Reasoning State for Claude Code and Codex
Contribute to himanshudongre/smriti development by creating an account on GitHub.…
Ask HN: Local model experiences with 'high-reasoning distill' finetunes
Hướng Dẫn Thiết Lập Reasoning Proxy DeepSeek V4-Pro với Cursor (2026)
Cắm DeepSeek V4-Pro vào Cursor bằng cấu hình OpenAI-compatible mặc định, bạn có thể gặp lỗi HTTP 400...…