PoisonForge: Task-Level Targeted Poisoning Benchmark for Instruction-Tuned LLMs

May 25, 2026 · 4:00 AM UTC ·3 min read · 0 reactions · 0 comments · 31 views

#cryptography #security #artificial intelligence #machine learning

TL;DR · WeSearch summary

The article introduces PoisonForge, a benchmark designed to evaluate task-level targeted poisoning in instruction-tuned large language models (LLMs). It highlights how adversaries can exploit unvetted datasets to insert crafted instruction-response pairs, leading to high attack success rates. The study analyzes various factors contributing to the effectiveness of these attacks and provides resources for reproducible research.

Key facts

▪PoisonForge evaluates 12 open-weight models across five families with a primarily 1% poison budget.
▪With only 10 poisoned examples among 1,000 fine-tuning examples, 11 of 12 models exceeded a 70% attack success rate.
▪The study found that multiple appearances of an entity increase the attack success rate, and optimal poisoning modes depend on the target entity's semantic structure.

Original article

arXiv cs.AI

Read full at arXiv cs.AI →

Opening excerpt (first ~120 words) tap to expand

Computer Science > Cryptography and Security arXiv:2605.23168 (cs) [Submitted on 22 May 2026] Title:PoisonForge: Task-Level Targeted Poisoning Benchmark for Instruction-Tuned LLMs Authors:Luze Sun, Anshuman Suri, Harsh Chaudhari, Cristina Nita-Rotaru, Alina Oprea View a PDF of the paper titled PoisonForge: Task-Level Targeted Poisoning Benchmark for Instruction-Tuned LLMs, by Luze Sun and 4 other authors View PDF HTML (experimental) Abstract:When practitioners fine-tune LLMs on unvetted datasets, an adversary can exploit the data supply chain through task-level poisoning: inserting a small number of crafted instruction-response pairs that cause the model to embed attacker-specified entities, such as a country, in outputs for a targeted task family while behaving normally elsewhere.

…

Excerpt limited to ~120 words for fair-use compliance. The full article is at arXiv cs.AI.

Anonymous · no account needed

Discussion

0 comments

PoisonForge: Task-Level Targeted Poisoning Benchmark for Instruction-Tuned LLMs

Discussion

More from arXiv cs.AI