LMR-BENCH: Can LLM Agents Reproduce NLP Research Code? (EMNLP 2025)

May 22, 2026 · 12:16 PM UTC ·5 min read · 0 reactions · 0 comments · 26 views

TL;DR · WeSearch summary

A research team from the University of Texas at Dallas introduced LMR-BENCH at EMNLP 2025, focusing on whether LLM agents can reproduce NLP research code. The benchmark includes 28 tasks based on recent NLP papers, testing agents on their ability to generate code that matches the original implementations. The evaluation measures both functional correctness and implementation fidelity, highlighting the challenges LLMs face in multi-file reasoning and complex code structures.

Key facts

▪LMR-BENCH consists of 28 reproduction tasks derived from 23 NLP papers published in the last five years.
▪Each task requires agents to fill in masked code functions based on provided papers and instructions.
▪The benchmark evaluates agents on functional correctness through unit tests and implementation fidelity using LLM assessments.

Original article

DEV.to (Top)

Read full at DEV.to (Top) →

Opening excerpt (first ~120 words) tap to expand

try { if(localStorage) { let currentUser = localStorage.getItem('current_user'); if (currentUser) { currentUser = JSON.parse(currentUser); if (currentUser.id === 1909290) { document.getElementById('article-show-container').classList.add('current-user-is-article-author'); } } } } catch (e) { console.error(e); } Jangwook Kim Posted on May 22 • Originally published at effloow.com LMR-BENCH: Can LLM Agents Reproduce NLP Research Code? (EMNLP 2025) #benchmark #researchreproducibility #llmagents #paperpoc A research team from the University of Texas at Dallas published LMR-BENCH at EMNLP 2025, asking a specific question: can LLM agents reproduce the core implementation from an NLP research paper when given the paper, a partially masked codebase, and explicit instructions? This is harder than it…

Excerpt limited to ~120 words for fair-use compliance. The full article is at DEV.to (Top).

Anonymous · no account needed

Discussion

0 comments

LMR-BENCH: Can LLM Agents Reproduce NLP Research Code? (EMNLP 2025)

Discussion

More from DEV.to (Top)