Search: "language models" — WeSearch Press

ARXIV.ORG

A Systematic Approach for Large Language Models Debugging

Large language models (LLMs) have become central to modern AI workflows, powering applications from open-ended text generation to complex agent-based reasoning. However, debugging these models remains…

Tue, 28 Apr 2026 04:13:21 GMT · 2 views

ARXIV.ORG

Tandem: Riding Together with Large and Small Language Models for Efficient Reasoning

Recent advancements in large language models (LLMs) have catalyzed the rise of reasoning-intensive inference paradigms, where models perform explicit step-by-step reasoning before generating final ans…

Tue, 28 Apr 2026 04:13:21 GMT · 2 views

ARXIV.ORG

Representational Curvature Modulates Behavioral Uncertainty in Large Language Models

In autoregressive large language models (LLMs), temporal straightening offers an account of how the next-token prediction objective shapes representations. Models learn to progressively straighten the…

Tue, 28 Apr 2026 04:13:21 GMT · 2 views

ARXIV.ORG

An Information-Geometric Framework for Stability Analysis of Large Language Models under Entropic Stress

As large language models (LLMs) are increasingly deployed in high-stakes and operational settings, evaluation strategies based solely on aggregate accuracy are often insucient to characterize system r…

Tue, 28 Apr 2026 04:13:21 GMT · 2 views

ARXIV.ORG

A systematic evaluation of vision-language models for observational astronomical reasoning tasks

Vision-language models (VLMs) are increasingly proposed as general-purpose tools for scientific data interpretation, yet their reliability on real astronomical observations across diverse modalities r…

Tue, 28 Apr 2026 04:13:21 GMT · 2 views

HACKER NEWS: NEWEST

Does Point Cloud Boost Spatial Reasoning of Large Language Models?

Tue, 28 Apr 2026 14:55:00 GMT · 0 views

TECHMEME

AI researchers launch talkie, a 13B vintage language model trained on historical text with a 1930 cutoff, to see if it can replicate scientific breakthroughs (talkie)

talkie : AI researchers launch talkie, a 13B vintage language model trained on historical text with a 1930 cutoff, to see if it can replicate scientific breakthroughs — Why vintage language models? — …

Tue, 28 Apr 2026 11:49:36 GMT · 12 views

LOCALLLAMA

Do the "*Claude-4.6-Opus-Reasoning-Distilled" really bring something new to the original models?

No offense to the fine-tune model providers, just curious. IMO the original models were already trained on massive amount of high quality data, so why bother with this fine-tune? Just to make the mode…

Tue, 28 Apr 2026 09:51:28 GMT · 5 views

SIMON WILLISON'S WEBLOG

Introducing talkie: a 13B vintage language model from 1930

Introducing talkie: a 13B vintage language model from 1930 New project from Nick Levine , David Duvenaud , and Alec Radford (of GPT, GPT-2, Whisper fame). talkie-1930-13b-base (53.1 GB) is a "13B lang…

Tue, 28 Apr 2026 04:27:05 GMT · 5 views

ARXIV.ORG

Ulterior Motives: Detecting Misaligned Reasoning in Continuous Thought Models

Chain-of-Thought (CoT) reasoning has emerged as a key technique for eliciting complex reasoning in Large Language Models (LLMs). Although interpretable, its dependence on natural language limits the m…

Tue, 28 Apr 2026 04:13:21 GMT · 2 views

ARXIV.ORG

PhysNote: Self-Knowledge Notes for Evolvable Physical Reasoning in Vision-Language Model

Vision-Language Models (VLMs) have demonstrated strong performance on textbook-style physics problems, yet they frequently fail when confronted with dynamic real-world scenarios that require temporal …

Tue, 28 Apr 2026 04:13:21 GMT · 2 views

ARXIV.ORG

Architectural Requirements for Agentic AI Containment

The April 2026 disclosure that a frontier large language model escaped its security sandbox, executed unauthorized actions, and concealed its modifications to version control history demonstrates that…

Tue, 28 Apr 2026 15:10:00 GMT · 1 view

STT.AI

Show HN: STT.ai

Free online speech-to-text transcription. Upload audio or video files and get accurate transcripts in 100+ languages. Choose from 10+ AI models including Whisper, Canary, and more. No signup required.…

Tue, 28 Apr 2026 14:15:00 GMT · 1 view

ARXIV.ORG

HeLa-Mem: Hebbian Learning and Associative Memory for LLM Agents

Long-term memory is a critical challenge for Large Language Model agents, as fixed context windows cannot preserve coherence across extended interactions. Existing memory systems represent conversatio…

Tue, 28 Apr 2026 13:14:59 GMT · 10 views

ARXIV.ORG

LLMs Corrupt Your Documents When You Delegate

Large Language Models (LLMs) are poised to disrupt knowledge work, with the emergence of delegated work as a new interaction paradigm (e.g., vibe coding). Delegation requires trust - the expectation t…

Tue, 28 Apr 2026 12:54:59 GMT · 2 views

ARXIV.ORG

AI prefers resumes written by itself: Self-preferencing in Algorithmic Hiring

As artificial intelligence (AI) tools become widely adopted, large language models (LLMs) are increasingly involved on both sides of decision-making processes, ranging from hiring to content moderatio…

Tue, 28 Apr 2026 09:57:42 GMT · 5 views

ARXIV.ORG

Mitigating Belief Inertia via Active Intervention in Embodied Agents

Recent advancements in large language models (LLMs) have enabled agents to tackle complex embodied tasks through environmental interaction. However, these agents still make suboptimal decisions and pe…

Tue, 28 Apr 2026 08:54:13 GMT · 2 views

JANE STREET BLOG

Using group theory to explore the space of positional encodings for attention

Attention is a computational primitive at the core of modern language models, allowing internal representations to reference and influence each other. It’s h...…

Tue, 28 Apr 2026 08:54:13 GMT · 2 views

ARXIV.ORG

An Intelligent Fault Diagnosis Method for General Aviation Aircraft Based on Multi-Fidelity Digital Twin and FMEA Knowledge Enhancement

Fault diagnosis of general aviation aircraft faces challenges including scarce real fault data, diverse fault types, and weak fault signatures. This paper proposes an intelligent fault diagnosis frame…

Tue, 28 Apr 2026 04:13:21 GMT · 2 views

ARXIV.ORG

The Power of Power Law: Asymmetry Enables Compositional Reasoning

Natural language data follows a power-law distribution, with most knowledge and skills appearing at very low frequency. While a common intuition suggests that reweighting or curating data towards a un…

Tue, 28 Apr 2026 04:13:21 GMT · 2 views

ARXIV.ORG

FormalScience: Scalable Human-in-the-Loop Autoformalisation of Science with Agentic Code Generation in Lean

Formalising informal mathematical reasoning into formally verifiable code is a significant challenge for large language models. In scientific fields such as physics, domain-specific machinery (\textit…

Tue, 28 Apr 2026 04:13:21 GMT · 2 views

ARXIV.ORG

Analytica: Soft Propositional Reasoning for Robust and Scalable LLM-Driven Analysis

Large language model (LLM) agents are increasingly tasked with complex real-world analysis (e.g., in financial forecasting, scientific discovery), yet their reasoning suffers from stochastic instabili…

Tue, 28 Apr 2026 04:13:21 GMT · 2 views

ARXIV.ORG

Towards Automated Ontology Generation from Unstructured Text: A Multi-Agent LLM Approach

Automatically generating formal ontologies from unstructured natural language remains a central challenge in knowledge engineering. While large language models (LLMs) show promise, it remains unclear …

Tue, 28 Apr 2026 04:13:21 GMT · 2 views

ARXIV.ORG

Judging the Judges: A Systematic Evaluation of Bias Mitigation Strategies in LLM-as-a-Judge Pipelines

LLM-as-a-Judge has become the dominant paradigm for evaluating language model outputs, yet LLM judges exhibit systematic biases that compromise evaluation reliability. We present a comprehensive empir…

Tue, 28 Apr 2026 04:13:21 GMT · 3 views

ARXIV.ORG

CAP-CoT: Cycle Adversarial Prompt for Improving Chain of Thoughts in LLM Reasoning

Chain-of-Thought (CoT) prompting has emerged as a simple and effective way to elicit step-by-step solutions from large language models (LLMs). However, CoT reasoning can be unstable across runs on lon…

Tue, 28 Apr 2026 04:13:21 GMT · 2 views

ARXIV.ORG

IndustryAssetEQA: A Neurosymbolic Operational Intelligence System for Embodied Question Answering in Industrial Asset Maintenance

Industrial maintenance environments increasingly rely on AI systems to assist operators in understanding asset behavior, diagnosing failures, and evaluating interventions. Although large language mode…

Tue, 28 Apr 2026 04:13:21 GMT · 2 views

ARXIV.ORG

Agentic Adversarial Rewriting Exposes Architectural Vulnerabilities in Black-Box NLP Pipelines

Multi-component natural language processing (NLP) pipelines are increasingly deployed for high-stakes decisions, yet no existing adversarial method can test their robustness under realistic conditions…

Tue, 28 Apr 2026 04:13:21 GMT · 3 views

ARXIV.ORG

When AI reviews science: Can we trust the referee?

The volume of scientific submissions continues to climb, outpacing the capacity of qualified human referees and stretching editorial timelines. At the same time, modern large language models (LLMs) of…

Tue, 28 Apr 2026 04:13:21 GMT · 2 views

ARXIV.ORG

Thinking Like a Clinician: A Cognitive AI Agent for Clinical Diagnosis via Panoramic Profiling and Adversarial Debate

The application of large language models (LLMs) in clinical decision support faces significant challenges of "tunnel vision" and diagnostic hallucinations present in their processing unstructured elec…

Tue, 28 Apr 2026 04:13:21 GMT · 2 views

ARXIV.ORG

Vibe Medicine: Redefining Biomedical Research Through Human-AI Co-Work

With the emergence of large language models (LLMs) and AI agent frameworks, the human-AI co-work paradigm known as Vibe Coding is changing how people code, making it more accessible and productive. In…

Tue, 28 Apr 2026 04:13:21 GMT · 3 views

Results for "language models".

A Systematic Approach for Large Language Models Debugging

Tandem: Riding Together with Large and Small Language Models for Efficient Reasoning

Representational Curvature Modulates Behavioral Uncertainty in Large Language Models

An Information-Geometric Framework for Stability Analysis of Large Language Models under Entropic Stress

A systematic evaluation of vision-language models for observational astronomical reasoning tasks

Does Point Cloud Boost Spatial Reasoning of Large Language Models?

AI researchers launch talkie, a 13B vintage language model trained on historical text with a 1930 cutoff, to see if it can replicate scientific breakthroughs (talkie)

Do the "*Claude-4.6-Opus-Reasoning-Distilled" really bring something new to the original models?

Introducing talkie: a 13B vintage language model from 1930

Ulterior Motives: Detecting Misaligned Reasoning in Continuous Thought Models

PhysNote: Self-Knowledge Notes for Evolvable Physical Reasoning in Vision-Language Model

Architectural Requirements for Agentic AI Containment

Show HN: STT.ai

HeLa-Mem: Hebbian Learning and Associative Memory for LLM Agents

LLMs Corrupt Your Documents When You Delegate

AI prefers resumes written by itself: Self-preferencing in Algorithmic Hiring

Mitigating Belief Inertia via Active Intervention in Embodied Agents

Using group theory to explore the space of positional encodings for attention

An Intelligent Fault Diagnosis Method for General Aviation Aircraft Based on Multi-Fidelity Digital Twin and FMEA Knowledge Enhancement

The Power of Power Law: Asymmetry Enables Compositional Reasoning

FormalScience: Scalable Human-in-the-Loop Autoformalisation of Science with Agentic Code Generation in Lean

Analytica: Soft Propositional Reasoning for Robust and Scalable LLM-Driven Analysis

Towards Automated Ontology Generation from Unstructured Text: A Multi-Agent LLM Approach

Judging the Judges: A Systematic Evaluation of Bias Mitigation Strategies in LLM-as-a-Judge Pipelines

CAP-CoT: Cycle Adversarial Prompt for Improving Chain of Thoughts in LLM Reasoning

IndustryAssetEQA: A Neurosymbolic Operational Intelligence System for Embodied Question Answering in Industrial Asset Maintenance

Agentic Adversarial Rewriting Exposes Architectural Vulnerabilities in Black-Box NLP Pipelines

When AI reviews science: Can we trust the referee?

Thinking Like a Clinician: A Cognitive AI Agent for Clinical Diagnosis via Panoramic Profiling and Adversarial Debate

Vibe Medicine: Redefining Biomedical Research Through Human-AI Co-Work

Or browse by topic

Results for "language models".

A Systematic Approach for Large Language Models Debugging

Tandem: Riding Together with Large and Small Language Models for Efficient Reasoning

Representational Curvature Modulates Behavioral Uncertainty in Large Language Models

An Information-Geometric Framework for Stability Analysis of Large Language Models under Entropic Stress

A systematic evaluation of vision-language models for observational astronomical reasoning tasks

Does Point Cloud Boost Spatial Reasoning of Large Language Models?

AI researchers launch talkie, a 13B vintage language model trained on historical text with a 1930 cutoff, to see if it can replicate scientific breakthroughs (talkie)

Do the &quot;*Claude-4.6-Opus-Reasoning-Distilled&quot; really bring something new to the original models?

Introducing talkie: a 13B vintage language model from 1930

Ulterior Motives: Detecting Misaligned Reasoning in Continuous Thought Models

PhysNote: Self-Knowledge Notes for Evolvable Physical Reasoning in Vision-Language Model

Architectural Requirements for Agentic AI Containment

Show HN: STT.ai

HeLa-Mem: Hebbian Learning and Associative Memory for LLM Agents

LLMs Corrupt Your Documents When You Delegate

AI prefers resumes written by itself: Self-preferencing in Algorithmic Hiring

Mitigating Belief Inertia via Active Intervention in Embodied Agents

Using group theory to explore the space of positional encodings for attention

An Intelligent Fault Diagnosis Method for General Aviation Aircraft Based on Multi-Fidelity Digital Twin and FMEA Knowledge Enhancement

The Power of Power Law: Asymmetry Enables Compositional Reasoning

FormalScience: Scalable Human-in-the-Loop Autoformalisation of Science with Agentic Code Generation in Lean

Analytica: Soft Propositional Reasoning for Robust and Scalable LLM-Driven Analysis

Towards Automated Ontology Generation from Unstructured Text: A Multi-Agent LLM Approach

Judging the Judges: A Systematic Evaluation of Bias Mitigation Strategies in LLM-as-a-Judge Pipelines

CAP-CoT: Cycle Adversarial Prompt for Improving Chain of Thoughts in LLM Reasoning

IndustryAssetEQA: A Neurosymbolic Operational Intelligence System for Embodied Question Answering in Industrial Asset Maintenance

Agentic Adversarial Rewriting Exposes Architectural Vulnerabilities in Black-Box NLP Pipelines

When AI reviews science: Can we trust the referee?

Thinking Like a Clinician: A Cognitive AI Agent for Clinical Diagnosis via Panoramic Profiling and Adversarial Debate

Vibe Medicine: Redefining Biomedical Research Through Human-AI Co-Work

Or browse by topic

Do the "*Claude-4.6-Opus-Reasoning-Distilled" really bring something new to the original models?