27 results for "llm agents"
Your Reviews Replicate You: LLM-Based Agents as Customer Digital Twins for Conjoint Analysis
Conjoint analysis is a cornerstone of market research for estimating consumer preferences; however, traditional methods face persistent challenges regarding time, cost, and respondent fatigue. To addr…
ast-outline: a parallel structural code summarizer written in Rust (5–10x token savings for LLM agents)
I just open-sourced ast-outline – a fast, zero-dependency CLI tool that extracts the structural outline of source files (classes, functions, signatures, fields, doc comments + line numbers) and drops …
HeLa-Mem: Hebbian Learning and Associative Memory for LLM Agents
Long-term memory is a critical challenge for Large Language Model agents, as fixed context windows cannot preserve coherence across extended interactions. Existing memory systems represent conversatio…
From Coarse to Fine: Self-Adaptive Hierarchical Planning for LLM Agents
Large language model-based agents have recently emerged as powerful approaches for solving dynamic and multi-step tasks. Most existing agents employ planning mechanisms to guide long-term actions in d…
TradingAgents v0.2.4: A Multi-Agent LLM Framework That Simulates an Entire Trading Firm
TL;DR UCLA Tauric Research released TradingAgents v0.2.4 (2026-04-25) — a LangGraph-based...…
Complete Cyclic Subtask Graphs for Tool-Using LLM Agents: Flexibility, Cost, and Bottlenecks in Multi-Agent Workflows
Long-horizon tool-using tasks sometimes benefit from revisiting earlier subtasks for recovery and exploration, but added multi-agent workflow flexibility can also introduce coordination overhead and s…
Can Current Agents Close the Discovery-to-Application Gap? A Case Study in Minecraft
Discovering causal regularities and applying them to build functional systems--the discovery-to-application loop--is a hallmark of general intelligence, yet evaluating this capacity has been hindered …
How Do AI Agents Spend Your Money? Analyzing and Predicting Token Consumption in Agentic Coding Tasks
The wide adoption of AI agents in complex human workflows is driving rapid growth in LLM token consumption. When agents are deployed on tasks that require a significant amount of tokens, three questio…
Quantifying Divergence in Inter-LLM Communication Through API Retrieval and Ranking
Large language models (LLMs) increasingly operate as autonomous agents that reason over external APIs to perform complex tasks. However, their reliability and agreement remain poorly characterized. We…
Mitigating Belief Inertia via Active Intervention in Embodied Agents
Recent advancements in large language models (LLMs) have enabled agents to tackle complex embodied tasks through environmental interaction. However, these agents still make suboptimal decisions and pe…
Analytica: Soft Propositional Reasoning for Robust and Scalable LLM-Driven Analysis
Large language model (LLM) agents are increasingly tasked with complex real-world analysis (e.g., in financial forecasting, scientific discovery), yet their reasoning suffers from stochastic instabili…
PhySE: A Psychological Framework for Real-Time AR-LLM Social Engineering Attacks
The emerging threat of AR-LLM-based Social Engineering (AR-LLM-SE) attacks (e.g. SEAR) poses a significant risk to real-world social interactions. In such an attack, a malicious actor uses Augmented R…
LEGO: An LLM Skill-Based Front-End Design Generation Platform
Existing LLM-based EDA agents are often isolated task-specific systems. This leads to repeated engineering effort and limited reuse of successful design and debugging strategies. We present LEGO, a un…
SoccerRef-Agents: Multi-Agent System for Automated Soccer Refereeing
Refereeing is vital in sports, where fair, accurate, and explainable decisions are fundamental. While intelligent assistant technologies are being widely adopted in soccer refereeing, current AI-assis…
MarketBench: Evaluating AI Agents as Market Participants
Markets are a promising way to coordinate AI agent activity for similar reasons to those used to justify markets more broadly. In order to effectively participate in markets, agents need to have infor…
Tool for inline annotation of LLM-generated specs and prompts (works with any MCP client)
I'm a product manager and spend a lot of time iterating on long prompts and specs that AI agents then act on. The review loop has been the worst part. When the model gives me a 5-page draft, leaving f…
Claude Leak Confirms It: LLM Systems Are Architecture, Not Prompts (Orca)
Agents should execute whenever possible — runtime for composable AI agent skills - gfernandf/agent-skills…
Agents Are Microservices with a Brain
We solved this in 2010. It was called microservices. Now we're making the same mistakes with LLMs.…
Learning in Blocks: A Multi Agent Debate Assisted Personalized Adaptive Learning Framework for Language Learning
Most digital language learning curricula rely on discrete-item quizzes that test recall rather than applied conversational proficiency. When progression is driven by quiz performance, learners can adv…
Claude-Powered Agent Apparently Deletes Company Database, Debases Itself Further in Confession
AI agents are powered by the same obsequious LLMs as consumer chatbots.…
PExA: Parallel Exploration Agent for Complex Text-to-SQL
LLM-based agents for text-to-SQL often struggle with latency-performance trade-off, where performance improvements come at the cost of latency or vice versa. We reformulate text-to-SQL generation with…
Discovering Agentic Safety Specifications from 1-Bit Danger Signals
Can large language model agents discover hidden safety objectives through experience alone? We introduce EPO-Safe (Experiential Prompt Optimization for Safe Agents), a framework where an LLM iterative…
MetaGAI: A Large-Scale and High-Quality Benchmark for Generative AI Model and Data Card Generation
The rapid proliferation of Generative AI necessitates rigorous documentation standards for transparency and governance. However, manual creation of Model and Data Cards is not scalable, while automate…
Vibe Medicine: Redefining Biomedical Research Through Human-AI Co-Work
With the emergence of large language models (LLMs) and AI agent frameworks, the human-AI co-work paradigm known as Vibe Coding is changing how people code, making it more accessible and productive. In…
Grounding Before Generalizing: How AI Differs from Humans in Causal Transfer
Extracting abstract causal structures and applying them to novel situations is a hallmark of human intelligence. While Large Language Models (LLMs) and Vision Language Models (VLMs) have shown strong …
Beyond the Attention Stability Boundary: Agentic Self-Synthesizing Reasoning Protocols
As LLM agents transition to autonomous digital coworkers, maintaining deterministic goal-directedness in non-linear multi-turn conversations emerged as an architectural bottleneck. We identify and for…
Evaluating whether AI models would sabotage AI safety research
We evaluate the propensity of frontier models to sabotage or refuse to assist with safety research when deployed as AI research agents within a frontier AI company. We apply two complementary evaluati…