I Asked 3 Claude Code Sub-agents to Review the Same PR. They Disagreed on 41% of the Comments.
An experiment with three Claude Code sub-agents reviewing the same pull request revealed a significant disagreement rate of 41% on comments. The agents were tasked with analyzing a 500-line refactor of a WebRTC signaling layer, and their findings varied widely. This highlights the complexities and potential inefficiencies of relying on multiple AI agents for code reviews.
- ▪Three Claude Code sub-agents reviewed the same 500-line pull request.
- ▪They disagreed on 41% of the comments made during the review.
- ▪The experiment demonstrated that having multiple agents does not guarantee consistent findings.
Opening excerpt (first ~120 words) tap to expand
try { if(localStorage) { let currentUser = localStorage.getItem('current_user'); if (currentUser) { currentUser = JSON.parse(currentUser); if (currentUser.id === 3800250) { document.getElementById('article-show-container').classList.add('current-user-is-article-author'); } } } } catch (e) { console.error(e); } Ken Imoto Posted on May 20 • Originally published at kenimoto.dev I Asked 3 Claude Code Sub-agents to Review the Same PR. They Disagreed on 41% of the Comments. #claudecode #codereview #agents #ai I thought multi-agent code review was a free upgrade. Three sub-agents looking at the same PR sounded like three pairs of eyes for the cost of one engineer's coffee. Then I ran three Claude Code sub-agents on the same 500-line refactor PR and watched them disagree on 41% of the comments.
…
Excerpt limited to ~120 words for fair-use compliance. The full article is at DEV.to (Top).