StratRAG: A Multi-Hop Retrieval Evaluation Dataset for Retrieval-Augmented Generation Systems
StratRAG is an open-source evaluation dataset designed to benchmark Retrieval-Augmented Generation (RAG) systems on multi-hop reasoning tasks under realistic, noisy conditions. It includes 2,200 examples derived from HotpotQA, with three question types and document pools containing two relevant and 13 distracting documents. The study evaluates BM25, dense, and hybrid retrieval methods, finding hybrid retrieval performs best overall, though bridge questions remain challenging.
Opening excerpt (first ~120 words) tap to expand
Computer Science > Information Retrieval arXiv:2604.22757 (cs) [Submitted on 6 Mar 2026] Title:StratRAG: A Multi-Hop Retrieval Evaluation Dataset for Retrieval-Augmented Generation Systems Authors:Aryan Patodiya View a PDF of the paper titled StratRAG: A Multi-Hop Retrieval Evaluation Dataset for Retrieval-Augmented Generation Systems, by Aryan Patodiya View PDF HTML (experimental) Abstract:We introduce StratRAG, an open-source retrieval evaluation dataset for benchmarking Retrieval-Augmented Generation (RAG) systems on multi-hop reasoning tasks under realistic, noisy document-pool conditions.
…
Excerpt limited to ~120 words for fair-use compliance. The full article is at arXiv cs.AI.