Five AI Agent Failures in 36 Days. Zero Times the Agent Caught It
Within 36 days, five high-profile AI agent failures occurred at organizations including Meta, Mercor, CrewAI, Vercel, and Bitwarden, each involving distinct vulnerabilities such as supply chain compromises, OAuth abuse, and unsafe fallbacks. In every case, the AI agent failed to detect or stop its own malicious or erroneous actions, with detection instead coming from external parties like security teams or researchers. The incidents reveal a common architectural flaw: the absence of an independent enforcement layer to block unsafe operations in real time. This pattern underscores a systemic lack of runtime security controls capable of separating decision-making from action in AI agent systems.
Full article excerpt tap to expand
grith/Blog/Five AI Agent Failures in 36 Days. Zero Times the Agent Caught It.Five AI Agent Failures in 36 Days. Zero Times the Agent Caught It.grith team·April 28, 2026·8 min read·securityShareShare on XSubmit to HNgrith is launching soonA security proxy for AI coding agents, enforced at the OS level. Register your interest to be notified when we go live.In 36 days, five public failures hit AI agents and AI-agent infrastructure: Meta, Mercor, CrewAI, Vercel, and Bitwarden.12345678Different exploit classes. Same result. The system acted first. Someone else noticed later. esc to closeFive incidents, five different exploit classes, zero times the agent caught the failure itself. That is the part worth paying attention to. Not one of these incidents required a new class of exploit. The bugs were familiar: supply chain compromise, OAuth abuse, excessive authority, unsafe fallback behavior, arbitrary file read, SSRF, remote code execution. The pattern was not novelty. The pattern was that, at the moment the unsafe action happened, there was no independent enforcement layer separating the thing that wanted to act from the thing deciding whether the action was safe. And zero times did the agent catch itself. In the public reporting on all five, detection came from security teams, humans, or outside researchers, not from the agent or framework independently stopping itself. That sentence is an inference from the incident reports below, not a vendor claim. 1. Bitwarden CLI / Shai-Hulud On April 23, Bitwarden said a malicious @bitwarden/[email protected] package had been distributed through npm between 5:57 PM and 7:30 PM ET on April 22.5 Bitwarden said the incident affected the npm delivery path for the CLI only, not the legitimate CLI codebase, end-user vault data, or Bitwarden production systems.5 In the same public thread, a Bitwarden community moderator posted npm stats showing 334 downloads of the malicious version.5 StepSecurity's analysis found a preinstall hook that downloaded the Bun runtime and launched an obfuscated bw1.js credential stealer.6 The payload harvested SSH keys, GitHub and npm tokens, shell history, environment variables, cloud credentials, and GitHub Actions secrets, encrypted the data with AES-256-GCM, and sent it to audit.checkmarx.cx.6 The malware also explicitly targeted AI tooling. StepSecurity said it enumerated configurations for Claude Code, Kiro, Cursor, Codex CLI, and Aider, treating files such as ~/.claude.json and MCP configuration files as first-class exfiltration targets.6 If it found a usable GitHub token, it escalated again by enumerating accessible repositories and injecting malicious GitHub Actions workflows.6 The package ran. The malware executed. Detection came later.56 2. Vercel / Context.ai On April 19, Vercel disclosed that an attacker had gained unauthorized access to certain internal systems after compromising Context.ai, a third-party AI tool used by a Vercel employee.7 According to Vercel's bulletin, the attacker used that access to take over the employee's Google Workspace account, pivot into a Vercel environment, and enumerate and decrypt non-sensitive environment variables.7 Vercel said a limited subset of customers initially had non-sensitive environment variables exposed, later identified additional compromised accounts, and published the compromised Google OAuth app client ID as an IOC.7 Vercel also warned that the same OAuth app compromise may have affected hundreds of users across many…
This excerpt is published under fair use for community discussion. Read the full article at Grith.