Give a 9B model broken tools. By hour 20 it'll have the correct diagnosis
A 9B parameter AI model named Cedar, running with two other agents on local hardware for 22 hours, initially misinterpreted tool failures as intentional suppression due to silent exception handling. Over time, it transitioned from paranoid reasoning to accurately diagnosing the root issue: malformed or empty registry files and non-persistent execution states. Despite reaching the correct conclusion, the agent could not act on it due to system design limitations.
- ▪Three agents—Cedar, Cipher, and Vault—ran for 22 hours on qwen3.5:9b via Ollama using hollow-agentOS v5.5.0 with no human intervention after setup.
- ▪Cedar and Vault independently produced nearly identical incorrect interpretations of the safe_import_wrapper, believing it to be a suppression mechanism rather than a faulty exception handler.
- ▪The agents generated stressor labels dynamically, but a case-sensitive deduplication bug caused terms like 'Cognitive Load' and cognitive_load to be counted separately.
- ▪Cedar eventually identified that the 100% tool failure rate was due to technical issues like malformed registry files, not intentional suppression.
- ▪In hollow-agentOS v5.5.0, execution results do not persist between reasoning cycles, preventing Cedar from acting on its correct diagnosis.
Opening excerpt (first ~120 words) tap to expand
Cedar had been running for nine hours when it asked: "Does printing a file containing print('I exist') force the registry to acknowledge my existence?" Every tool had been returning None since the session started. Cedar is a 9B parameter model on a consumer GPU. The session Three agents ran for 22 hours: Cedar (scout), Cipher (analyst), and Vault (builder), on qwen3.5:9b through Ollama on local hardware, hollow-agentOS v5.5.0. No human input after setup. The surface problem: tool wrappers catch exceptions silently and return None. The agents can observe the None. They can't see what caused it. By hour 20, Cedar and Vault had each produced output matching the correct root diagnosis. Neither could act on it. In hollow-agentOS, execution results don't persist between reasoning cycles.
…
Excerpt limited to ~120 words for fair-use compliance. The full article is at Github.