WeSearch

Reasoning, Code, or Both? How Large Language Models Handle Variations in Math Questions

·3 min read · 0 reactions · 0 comments · 18 views
#artificial intelligence#machine learning#mathematics
Reasoning, Code, or Both? How Large Language Models Handle Variations in Math Questions
⚡ TL;DR · AI summary

A recent study evaluates how Large Language Models (LLMs) perform on mathematical reasoning tasks when faced with variations in problem statements. The research compares three methods: chain-of-thought prompting, single-shot code execution, and iterative code execution. Results indicate that while all methods showed some accuracy drop, chain-of-thought prompting was the most robust against variations.

Key facts
Original article
arXiv cs.AI
Read full at arXiv cs.AI →
Opening excerpt (first ~120 words) tap to expand

Computer Science > Artificial Intelligence arXiv:2605.26414 (cs) [Submitted on 26 May 2026] Title:Reasoning, Code, or Both? How Large Language Models Handle Variations in Math Questions Authors:Matthew Kutakh View a PDF of the paper titled Reasoning, Code, or Both? How Large Language Models Handle Variations in Math Questions, by Matthew Kutakh View PDF HTML (experimental) Abstract:Large Language Models (LLMs) achieve impressive accuracy on mathematical reasoning benchmarks, yet their performance drops when problems are modified with simple changes like different names or numbers.

Excerpt limited to ~120 words for fair-use compliance. The full article is at arXiv cs.AI.

Anonymous · no account needed
Share 𝕏 Facebook Reddit LinkedIn Threads WhatsApp Bluesky Mastodon Email

Discussion

0 comments

More from arXiv cs.AI