Reasoners or Translators? Contamination-aware Evaluation and Neuro-Symbolic Robustness in Tax Law
The paper discusses the reliability of large language models (LLMs) in legal reasoning, particularly in tax law. It highlights the potential for data contamination to inflate performance metrics and presents a systematic evaluation of LLMs versus hybrid systems. The findings suggest that neuro-symbolic frameworks provide a more robust approach to legal AI, enhancing generalization to new situations.
- ▪Recent advances in large language models have improved automated legal reasoning.
- ▪The study implements a contamination detection protocol to assess LLM reliability.
- ▪Findings indicate that neuro-symbolic frameworks offer a more reliable foundation for legal AI.
Opening excerpt (first ~120 words) tap to expand
Computer Science > Artificial Intelligence arXiv:2605.16052 (cs) [Submitted on 15 May 2026] Title:Reasoners or Translators? Contamination-aware Evaluation and Neuro-Symbolic Robustness in Tax Law Authors:Parisa Kordjamshidi, Samer Aslan, Madhavan Seshadri, Leslie Barrett, Enrico Santus View a PDF of the paper titled Reasoners or Translators? Contamination-aware Evaluation and Neuro-Symbolic Robustness in Tax Law, by Parisa Kordjamshidi and Samer Aslan and Madhavan Seshadri and Leslie Barrett and Enrico Santus View PDF Abstract:Recent advances in large language models (LLMs) have significantly enhanced automated legal reasoning. Yet, it remains unclear whether their performance reflects genuine legal reasoning ability or artifacts of data contamination.
…
Excerpt limited to ~120 words for fair-use compliance. The full article is at arXiv cs.AI.