WeSearch

PoisonForge: Task-Level Targeted Poisoning Benchmark for Instruction-Tuned LLMs

·3 min read · 0 reactions · 0 comments · 10 views
#cryptography#security#artificial intelligence#machine learning
PoisonForge: Task-Level Targeted Poisoning Benchmark for Instruction-Tuned LLMs
⚡ TL;DR · AI summary

The article introduces PoisonForge, a benchmark designed to evaluate task-level targeted poisoning in instruction-tuned large language models (LLMs). It highlights how adversaries can exploit unvetted datasets to insert crafted instruction-response pairs, leading to high attack success rates. The study analyzes various factors contributing to the effectiveness of these attacks and provides resources for reproducible research.

Key facts
Original article
arXiv cs.AI
Read full at arXiv cs.AI →
Opening excerpt (first ~120 words) tap to expand

Computer Science > Cryptography and Security arXiv:2605.23168 (cs) [Submitted on 22 May 2026] Title:PoisonForge: Task-Level Targeted Poisoning Benchmark for Instruction-Tuned LLMs Authors:Luze Sun, Anshuman Suri, Harsh Chaudhari, Cristina Nita-Rotaru, Alina Oprea View a PDF of the paper titled PoisonForge: Task-Level Targeted Poisoning Benchmark for Instruction-Tuned LLMs, by Luze Sun and 4 other authors View PDF HTML (experimental) Abstract:When practitioners fine-tune LLMs on unvetted datasets, an adversary can exploit the data supply chain through task-level poisoning: inserting a small number of crafted instruction-response pairs that cause the model to embed attacker-specified entities, such as a country, in outputs for a targeted task family while behaving normally elsewhere.

Excerpt limited to ~120 words for fair-use compliance. The full article is at arXiv cs.AI.

Anonymous · no account needed
Share 𝕏 Facebook Reddit LinkedIn Threads WhatsApp Bluesky Mastodon Email

Discussion

0 comments

More from arXiv cs.AI