Arc Gate — LLM proxy that catches 100% of indirect/roleplay prompt injection attacks (beats OpenAI Moderation and LlamaGuard)
·
0 reactions
·
0 comments
·
10 views
Built an LLM proxy that sits in front of any OpenAI-compatible endpoint and blocks prompt injection before it reaches your model. Benchmarked against OpenAI Moderation API and LlamaGuard 3 8B on 40 out-of-distribution prompts, indirect requests, roleplay framings, hypothetical scenarios, technical phrasings: Arc Gate: Recall 1.00, F1 0.95 OpenAI Moderation: Recall 0.75, F1 0.86 LlamaGuard 3 8B: Recall 0.55, F1 0.71 Arc Gate catches every harmful prompt in this category. LlamaGuard misses nearly
Original article
PromptEngineering
Anonymous · no account needed