Arc Gate — LLM proxy that catches 100% of indirect/roleplay prompt injection attacks (beats OpenAI Moderation and LlamaGuard)

Apr 28, 2026 · 5:33 PM UTC · 0 reactions · 0 comments · 10 views

via

PromptEngineering

Built an LLM proxy that sits in front of any OpenAI-compatible endpoint and blocks prompt injection before it reaches your model. Benchmarked against OpenAI Moderation API and LlamaGuard 3 8B on 40 out-of-distribution prompts, indirect requests, roleplay framings, hypothetical scenarios, technical phrasings: Arc Gate: Recall 1.00, F1 0.95 OpenAI Moderation: Recall 0.75, F1 0.86 LlamaGuard 3 8B: Recall 0.55, F1 0.71 Arc Gate catches every harmful prompt in this category. LlamaGuard misses nearly

Original article

PromptEngineering

Read full at PromptEngineering →

Anonymous · no account needed

Discussion

0 comments

Arc Gate — LLM proxy that catches 100% of indirect/roleplay prompt injection attacks (beats OpenAI Moderation and LlamaGuard)

Discussion

More from PromptEngineering