Search: "model alignment" — WeSearch Press

10 stories match your query across our 700+ source catalog. Ranked by relevance and recency.

10 results for "model alignment"

Ulterior Motives: Detecting Misaligned Reasoning in Continuous Thought Models

Chain-of-Thought (CoT) reasoning has emerged as a key technique for eliciting complex reasoning in Large Language Models (LLMs). Although interpretable, its dependence on natural language limits the m…

Tue, 28 Apr 2026 04:13:21 GMT · 3 views

ARXIV.ORG

The Kerimov-Alekberli Model: An Information-Geometric Framework for Real-Time System Stability

This study introduces the Kerimov-Alekberli model, a novel information-geometric framework that redefines AI safety by formally linking non-equilibrium thermodynamics to stochastic control for the eth…

Tue, 28 Apr 2026 04:13:21 GMT · 3 views

ARXIV.ORG

Architectural Requirements for Agentic AI Containment

The April 2026 disclosure that a frontier large language model escaped its security sandbox, executed unauthorized actions, and concealed its modifications to version control history demonstrates that…

Tue, 28 Apr 2026 15:10:00 GMT · 3 views

ARXIV.ORG

A Decoupled Human-in-the-Loop System for Controlled Autonomy in Agentic Workflows

AI agents are increasingly deployed to execute tasks and make decisions within agentic workflows, introducing new requirements for safe and controlled autonomy. Prior work has established the importan…

Tue, 28 Apr 2026 04:13:21 GMT · 3 views

ARXIV.ORG

Structural Enforcement of Goal Integrity in AI Agents via Separation-of-Powers Architecture

Recent evidence suggests that frontier AI systems can exhibit agentic misalignment, generating and executing harmful actions derived from internally constructed goals, even without explicit user reque…

Tue, 28 Apr 2026 04:13:21 GMT · 3 views

ARXIV.ORG

GAMED.AI: A Hierarchical Multi-Agent Framework for Automated Educational Game Generation

We introduce GameDAI, a hierarchical multi-agent framework that transforms instructor-provided questions into fully playable, pedagogically grounded educational games validated through formal mechanic…

Tue, 28 Apr 2026 04:13:21 GMT · 3 views

ARXIV.ORG

Context-Aware Hospitalization Forecasting Evaluations for Decision Support using LLMs

Medical and public health experts must make real-time resource decisions, such as expanding hospital bed capacity, based on projected hospitalization trends during large-scale healthcare disruptions (…

Tue, 28 Apr 2026 04:13:21 GMT · 4 views

ARXIV.ORG

Multi-Dimensional Evaluation of Sustainable City Trips with LLM-as-a-Judge and Human-in-the-Loop

Evaluating nuanced conversational travel recommendations is challenging when human annotations are costly and standard metrics ignore stakeholder-centric goals. We study LLMs-as-Judges for sustainable…

Tue, 28 Apr 2026 04:13:21 GMT · 4 views

ARXIV.ORG

Aligning with Your Own Voice: Self-Corrected Preference Learning for Hallucination Mitigation in LVLMs

Large Vision-Language Models (LVLMs) frequently suffer from hallucinations. Existing preference learning-based approaches largely rely on proprietary models to construct preference datasets. We identi…

Tue, 28 Apr 2026 04:13:21 GMT · 2 views

ARXIV.ORG

XGRAG: A Graph-Native Framework for Explaining KG-based Retrieval-Augmented Generation

Graph-based Retrieval-Augmented Generation (GraphRAG) extends traditional RAG by using knowledge graphs (KGs) to give large language models (LLMs) a structured, semantically coherent context, yielding…

Tue, 28 Apr 2026 04:13:21 GMT · 3 views

Or browse by topic

World US Politics Technology AI Markets Business Science Climate Health Culture Media

Results for "model alignment".

Ulterior Motives: Detecting Misaligned Reasoning in Continuous Thought Models

The Kerimov-Alekberli Model: An Information-Geometric Framework for Real-Time System Stability

Architectural Requirements for Agentic AI Containment

A Decoupled Human-in-the-Loop System for Controlled Autonomy in Agentic Workflows

Structural Enforcement of Goal Integrity in AI Agents via Separation-of-Powers Architecture

GAMED.AI: A Hierarchical Multi-Agent Framework for Automated Educational Game Generation

Context-Aware Hospitalization Forecasting Evaluations for Decision Support using LLMs

Multi-Dimensional Evaluation of Sustainable City Trips with LLM-as-a-Judge and Human-in-the-Loop

Aligning with Your Own Voice: Self-Corrected Preference Learning for Hallucination Mitigation in LVLMs

XGRAG: A Graph-Native Framework for Explaining KG-based Retrieval-Augmented Generation

Or browse by topic