DarkLLM: Learning Language-Driven Adversarial Attacks with Large Language Models
The paper introduces DarkLLM, a novel framework for generating adversarial attacks using large language models. This approach allows for the translation of natural-language attack instructions into effective visual perturbations across various models. The authors demonstrate that DarkLLM can produce highly effective attacks with only 1B parameters, highlighting vulnerabilities in modern foundation models.
- ▪DarkLLM unifies various types of adversarial attacks within a single framework.
- ▪The framework leverages natural-language instruction tuning for flexible adversarial generation.
- ▪Extensive experiments show DarkLLM's effectiveness against multiple models and tasks.
Opening excerpt (first ~120 words) tap to expand
Computer Science > Cryptography and Security arXiv:2605.18868 (cs) [Submitted on 15 May 2026] Title:DarkLLM: Learning Language-Driven Adversarial Attacks with Large Language Models Authors:Ye Sun, Xin Wang, Jiaming Zhang, Yifeng Gao, Yixu Wang, Yifan Ding, Qixian Zhang, Henghui Ding, Xingjun Ma, Yu-Gang Jiang View a PDF of the paper titled DarkLLM: Learning Language-Driven Adversarial Attacks with Large Language Models, by Ye Sun and 9 other authors View PDF HTML (experimental) Abstract:While vision and multimodal foundation models underpin critical tasks from perception to complex reasoning, they remain highly vulnerable to adversarial attacks.
…
Excerpt limited to ~120 words for fair-use compliance. The full article is at arXiv cs.AI.