Toward Reliable Design of LLM-Enabled Agentic Workflows: Optimizing Latency-Reliability-Cost Tradeoffs
The paper discusses the design of workflows that utilize large language models (LLMs) alongside traditional computational agents. It focuses on optimizing the tradeoffs between latency, reliability, and cost in these workflows. Key findings include a token allocation policy and insights into workflow reliability based on economic principles.
- ▪The study analyzes the interactions between LLMs and conventional computational modules in AI systems.
- ▪It introduces performance models that relate computational effort to output quality for both LLM and non-LLM agents.
- ▪The research presents a water-filling token allocation policy to optimize workflow performance under constraints.
Opening excerpt (first ~120 words) tap to expand
Computer Science > Artificial Intelligence arXiv:2605.23929 (cs) [Submitted on 21 Apr 2026] Title:Toward Reliable Design of LLM-Enabled Agentic Workflows: Optimizing Latency-Reliability-Cost Tradeoffs Authors:Ya-Ting Yang, Quanyan Zhu View a PDF of the paper titled Toward Reliable Design of LLM-Enabled Agentic Workflows: Optimizing Latency-Reliability-Cost Tradeoffs, by Ya-Ting Yang and Quanyan Zhu View PDF HTML (experimental) Abstract:Modern AI systems increasingly rely on workflows composed of multiple interacting agents, some powered by large language models (LLMs) and others by conventional computational modules. This paper analyzes the fundamental tradeoffs between latency, reliability, and cost in LLM-enabled agentic workflows.
…
Excerpt limited to ~120 words for fair-use compliance. The full article is at arXiv cs.AI.