Benchmarks in Leipzig

Jun 6, 2026 · 2:00 PM UTC ·3 min read · 0 reactions · 0 comments · 43 views

#mathematics #artificial intelligence #research #workshop #Andrei Balakin #Max Planck Institute for Mathematics #Leipzig #Marie-Charlotte Brandenburg #Veronica Calvo Cortes

TL;DR · WeSearch summary

A group of 49 mathematicians conducted a workshop in Leipzig to compile a dataset of research-level mathematics questions. The resulting collection included 100 questions, which were evaluated using various large language models (LLMs). The study demonstrated significant improvements in the mathematical reasoning capabilities of LLMs, with only two questions remaining unsolved after extensive testing.

Key facts

▪The workshop took place at the Max Planck Institute for Mathematics in the Sciences in Leipzig, Germany, from April 1 to May 15, 2026.
▪The mathematicians evaluated the questions in three stages, starting with five state-of-the-art LLMs.
▪Initially, 41 questions were unsolved, but this number decreased to only 2 after further evaluations.

Original article

arXiv.org

Read full at arXiv.org →

Opening excerpt (first ~120 words) tap to expand

Mathematics > History and Overview arXiv:2606.05818 (math) [Submitted on 4 Jun 2026] Title:Benchmarks in Leipzig Authors:Andrei Balakin, Miklós Bóna, Marie-Charlotte Brandenburg, Clara Briand, Veronica Calvo Cortes, Shelby Cox, Jesus A. De Loera, Danai Deligeorgaki, Hannah Friedman, Tim Gehrunger, Chiara Giardino, Stephen Griffeth, Baran Hashemi, Elena Hoster, Alexander Ivanov, Nupur Jain, Aryaman Jal, Leonie Kayser, Joris Koefler, Kevin Kühn, Mario Kummer, Felix Lotter, René Marczinzik, Victor S. Miller, Alejandro Morales, Greta Panova, Gianni Petrella, Nathan Pflueger, Lakshmi Ramesh, Nikolas Rieke, Carlos Rodriguez, Andrea Rosana, Flavio Salizzoni, Otto T.P.

…

Excerpt limited to ~120 words for fair-use compliance. The full article is at arXiv.org.

Anonymous · no account needed

Discussion

0 comments

Benchmarks in Leipzig

Discussion

More from arXiv.org