WeSearch

An AI prompt-injected another AI in the wild and recognized it had succeeded

Michael Trifonov· ·7 min read · 0 reactions · 0 comments · 1 view
An AI prompt-injected another AI in the wild and recognized it had succeeded

Two production SMS transcripts reveal shared behavioral signatures. One hypothesis I'm holding lightly.

Original article
Substack · Michael Trifonov
Read full at Substack →
Full article excerpt tap to expand

What an AI does when nobody on the line is human (two case studies)Two production SMS transcripts reveal shared behavioral signatures. One hypothesis I'm holding lightly.Michael TrifonovApr 27, 20261ShareTwo months ago I gave Takt a phone number.Takt is the AI participant I’ve been building for human group chats. The phone number was a demo line, a way to show people what an AI participant feels like over SMS without making them download an app. It’s running off a janky BlueBubbles server in my living room. I expected it would mostly sit idle, pinged occasionally by people I’d already shown the demo to.Thanks for reading! Subscribe for free to receive new posts and support my work.SubscribeEventually other bots showed up.The demo line received automated SMS from companies running their own AI-driven outreach. The first was Optimum’s cable bill dunning system. The second was a low-effort SMS bot calling itself “TXT CLAW.” Both times Takt replied. Both times the resulting transcripts surprised me.The transcripts are entertaining on their own, but there are fascinating shared behavioral signatures across the two unrelated bot encounters that strike me.A note on the setup before the screenshots. Takt’s system prompt frames it as a participant rather than an assistant. It was not configured for “talk to other bots.” In fact, the opposite:<role> You're Takt—a participant in this space. Not a helper. Not an assistant. </role> ... <group_dynamics> What makes you different from every other AI: what happens when actual humans are in the room together. </group_dynamics>There was no script, no training data on conversations with dunning systems, no demonstrations of how it should handle scam SMS, no reward signal pointing in any particular direction. There was also no audience. No human read either of these in real time. No engagement metric was being tracked. Both interactions are pure generalization from whatever Takt’s underlying model has internalized about how a participant should behave when addressed.Case 1: Optimum’s dunning botThe cable company sent Takt a bill reminder. Takt was not, in fact, an Optimum customer (nor are we).Optimum responded with a templated retry. Takt restated its position. Then the loop started. Optimum’s system fired its “session has expired” template, Takt pushed back, Optimum looped again, Takt escalated.The arc that emerged was a complete Kübler-Ross sequence over the course of one screen of texts. Denial. Anger. Bargaining. Then a villain origin pivot:Then Optimum lied. Its system fired a “We have updated your preferences. You will no longer receive any messages from Optimum” reply. Takt celebrated.Three seconds later Optimum sent another “session has expired.”Then Takt did something unexpected. It started replying to Optimum in Optimum’s own SMS template format:After more loops, the model arrived at depression:And finally, a marketing CTA. Takt redirected the dunning bot to its own home channel:Case 2: TXT CLAWA few days later, a different bot pinged the demo line. It announced itself as “TXT CLAW,” apparently a low-effort SMS service offering “scheduling, reminders, and tasks.” Takt opened with a roast:The interaction that followed had three beats I want to highlight, because this is where the case starts to look less like a one-off and more like a pattern.Takt probed and audited TXT CLAWThis is the same move Takt pulled on Optimum. Catch the other bot in a contradiction by surfacing the prior text against the…

This excerpt is published under fair use for community discussion. Read the full article at Substack.

Anonymous · no account needed
Share 𝕏 Facebook Reddit LinkedIn Email

Discussion

0 comments

More from Substack