Distilling Answer-Set Programming Rules from LLMs for Neurosymbolic Visual Question Answering
The article discusses a novel approach for enhancing Visual Question Answering (VQA) by distilling rules from Large Language Models (LLMs). This method allows for the adaptation of logic-based representations in response to changing task requirements, improving interpretability. The authors demonstrate that their approach is effective across various VQA datasets with minimal examples needed for accurate rule generation.
- ▪Visual Question Answering (VQA) requires integrating multimodal input and reasoning.
- ▪The proposed method prompts an LLM to extend initial VQA reasoning theories expressed as answer-set programs.
- ▪The approach has shown effectiveness across diverse VQA datasets, requiring only a few examples to generate correct rules.
Opening excerpt (first ~120 words) tap to expand
Computer Science > Artificial Intelligence arXiv:2606.03269 (cs) [Submitted on 2 Jun 2026] Title:Distilling Answer-Set Programming Rules from LLMs for Neurosymbolic Visual Question Answering Authors:Thomas Eiter, Nelson Higuera Ruiz, Johannes Oetsch View a PDF of the paper titled Distilling Answer-Set Programming Rules from LLMs for Neurosymbolic Visual Question Answering, by Thomas Eiter and 2 other authors View PDF Abstract:Visual Question Answering (VQA) is the task of answering questions about images, requiring the integration of multimodal input and reasoning. Modular approaches that incorporate logic-based representations into the reasoning component offer clear advantages over end-to-end trained systems, particularly in terms of interpretability.
…
Excerpt limited to ~120 words for fair-use compliance. The full article is at arXiv cs.AI.