M3DocDep: Multi-modal, Multi-page, Multi-document Dependency Chunking with Large Vision-Language Models

May 20, 2026 · 4:00 AM UTC ·3 min read · 0 reactions · 0 comments · 33 views

#information retrieval #artificial intelligence #document processing

TL;DR · WeSearch summary

The article discusses a new method called M3DocDep for processing long, multi-page documents using large vision-language models. This method aims to improve the chunking of documents by recovering block-level dependencies before creating retrieval units. The results indicate significant improvements in retrieval and answer quality metrics compared to existing methods.

Key facts

▪M3DocDep is designed to enhance retrieval-augmented generation in long, multi-page industrial documents.
▪The method addresses issues with existing chunkers that fail to capture cross-page relationships and other structural cues.
▪M3DocDep shows improvements in various benchmarks, including a 28.5 to 39.6 percent increase in STEDS and a 1.1 to 15.3 percent increase in retrieval nDCG.

Original article

arXiv cs.AI

Read full at arXiv cs.AI →

Opening excerpt (first ~120 words) tap to expand

Computer Science > Information Retrieval arXiv:2605.18774 (cs) [Submitted on 17 Apr 2026] Title:M3DocDep: Multi-modal, Multi-page, Multi-document Dependency Chunking with Large Vision-Language Models Authors:Joongmin Shin, Jeongbae Park, Jaehyung Seo, Heuiseok Lim View a PDF of the paper titled M3DocDep: Multi-modal, Multi-page, Multi-document Dependency Chunking with Large Vision-Language Models, by Joongmin Shin and 3 other authors View PDF HTML (experimental) Abstract:In long, multi-page industrial documents, retrieval-augmented generation (RAG) depends heavily on whether chunk boundaries follow the document's true structure.

…

Excerpt limited to ~120 words for fair-use compliance. The full article is at arXiv cs.AI.

Anonymous · no account needed

Discussion

0 comments

M3DocDep: Multi-modal, Multi-page, Multi-document Dependency Chunking with Large Vision-Language Models

Discussion

More from arXiv cs.AI