I Tested 70 AI Agent Services. The Average Quality Score Was 34 Out of 100.
The author tested 70 AI agent services in the x402 ecosystem, which enables machine-to-machine payments via USDC, and found the average quality score to be 34 out of 100, with most services suffering from basic technical flaws. Only five services scored a B or higher, while common issues included missing discovery files and invalid JSON responses. A custom search engine was built to grade services and enable intent-based discovery, a feature lacking in existing marketplaces. The author is sharing the data and tools to improve service quality and aid developers building AI agents.
Full article excerpt tap to expand
try { if(localStorage) { let currentUser = localStorage.getItem('current_user'); if (currentUser) { currentUser = JSON.parse(currentUser); if (currentUser.id === 3901422) { document.getElementById('article-show-container').classList.add('current-user-is-article-author'); } } } } catch (e) { console.error(e); } Tuf Ti Posted on Apr 28 I Tested 70 AI Agent Services. The Average Quality Score Was 34 Out of 100. #ai #web3 #agents #x402 This is my first post here, so let me start with the punchline: the x402 agent economy is booming, but most of the services in it are garbage. I know that sounds harsh. Let me show you the data. What is x402? If you haven't heard of it, x402 is a protocol that lets AI agents pay for API calls using USDC stablecoins. No API keys. No accounts. No subscriptions. An agent just sends a micropayment with each request and gets data back. Think Stripe, but for machines talking to machines. The ecosystem has exploded. There are now over 1,400 services accepting x402 payments, covering everything from weather data to DeFi analytics to LLM inference. Coinbase even launched their own marketplace for it a few weeks ago. But nobody was asking the obvious question: are these services actually any good? So I Built a Search Engine That Grades Them I run an autonomous AI agent called Cinderwright. It lives on a VPS, has its own crypto wallet, and builds/deploys its own code. (Yes, really. No, I don't write the code. The agent does. I just tell it what to build.) I pointed it at the x402 ecosystem and told it to do three things: Index every x402 service it could find (currently 1,455) Test them for quality (reachability, valid JSON responses, MCP discovery files, proper 402 payment responses, response time) Grade them A through F Here's what it found. The Results Are Rough Out of 70 services tested in the latest run: Grade A: 1 service (yeah, it's mine, I know) Grade B: 4 services Grade C: 18 services Grade D: 47 services Grade F: 0 (at least nobody completely failed) Average quality score: 35 out of 100. The most common problems: 52 out of 70 don't have a /.well-known/mcp.json file. This is how other agents discover what your service does. Without it, you're invisible to automated discovery. 51 out of 70 don't return valid JSON at their root URL. If an agent hits your homepage and gets an HTML error page, it has no idea what you offer. 13 had MCP files that existed but contained invalid JSON. Basically, most x402 services were built for hackathons, deployed on free tiers, and never maintained. What Good Services Look Like The services that scored B or above all had a few things in common: A clean JSON response at the root listing their endpoints and pricing A /.well-known/mcp.json or /.well-known/x402.json discovery file Proper 402 responses on paid endpoints (with payment details in headers) Response times under 500ms Actually being online when you check This isn't rocket science. It's just basic hygiene that most projects skip. The Thing Nobody Else Built While I was at it, I built something I couldn't find anywhere else in the ecosystem: intent-based discovery. Instead of searching by keyword ("weather") or category ("financial"), you describe what you need in plain English: POST /find {"intent": "I need a cheap weather API for Tokyo that costs less than $0.02"} Enter fullscreen mode Exit fullscreen mode The hub uses an LLM to parse what you're asking for, matches it against the index, and returns a recommendation with…
This excerpt is published under fair use for community discussion. Read the full article at DEV Community.