pith. sign in

arxiv: 2606.07235 · v2 · pith:CB6ZWQBGnew · submitted 2026-06-05 · 💻 cs.IR · cs.LG

FLOWREADER: Min-Cost Flow Optimization for Multi-Modal Long Document Q&A

Pith reviewed 2026-06-27 20:45 UTC · model grok-4.3

classification 💻 cs.IR cs.LG
keywords min-cost flowmultimodal retrievallong document QAevidence assemblyfragmented evidenceVisDoMBench
0
0 comments X

The pith

Min-cost flow on a multimodal graph assembles fragmented evidence from long documents into answer paths.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents FLOWREADER to solve long-document multimodal Q&A where evidence fragments across text, tables, and slides. It casts evidence assembly as a min-cost flow problem whose single scoring vector simultaneously chooses sources via MMR, chooses sinks via length-aware answerability, and sets every edge cost and capacity. The resulting flow is broken into paths, a compact subset is chosen by entropy-regularized replicator dynamics, and answers are generated by parallel VLMs with an optional refinement pass. The method records its largest gains precisely on the two subsets whose evidence is most fragmented. A reader would care because standard top-k chunk retrieval treats fragments independently and therefore cannot exploit the connections the flow is designed to capture.

Core claim

FLOWREADER reframes evidence assembly as a min-cost flow problem on a multimodal node graph: a single scoring vector h controls source selection via MMR, sink selection via a length-aware answerability proxy, and the costs and capacities of every edge. The optimal flow is decomposed into candidate evidence paths, a compact non-redundant subset is selected by entropy-regularized replicator dynamics, and parallel VLM workers under a dual-process gate produce the answer with a single System-2 refinement pass triggered when answer consistency is low or the routed flow is strained. On VisDoMBench it is strongest on the fragmented-evidence subsets PaperTab and SlideVQA and competitive on the remai

What carries the argument

Min-cost flow on a multimodal node graph whose single scoring vector h jointly governs source selection, sink selection, edge costs, and capacities.

If this is right

  • Min-cost flow yields connected evidence paths that top-k retrieval misses on documents with evidence split across modalities.
  • A single vector unifies control of scoring, routing, selection, and adaptive compute in one optimization.
  • Performance gains concentrate on the two subsets whose evidence is most fragmented (PaperTab +1.30, SlideVQA +0.62).
  • The overall macro-average remains within 0.74 of the strongest baseline while adding explicit path connectivity.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same flow formulation could be applied to any retrieval setting where evidence must be chained rather than ranked independently, such as multi-hop question answering over knowledge graphs.
  • The dual-process gate offers a concrete mechanism for trading compute against answer consistency that could be tested on other generative retrieval pipelines.
  • Because the graph construction and flow objective are modality-agnostic, the approach could be extended to additional input types such as audio transcripts or video keyframes without redesigning the optimizer.

Load-bearing premise

One scoring vector can simultaneously and effectively set source selection, sink selection, and all edge costs and capacities so that the resulting flow produces useful evidence paths.

What would settle it

A controlled experiment that replaces the min-cost flow step with independent top-k selection on the identical multimodal graph and measures whether accuracy on the fragmented subsets drops, stays the same, or rises.

Figures

Figures reproduced from arXiv: 2606.07235 by Ambuj Mehrish, Sebastiano Vascon.

Figure 1
Figure 1. Figure 1: FLOWREADER pipeline. (1) A multimodal graph encodes text, table, and image nodes. (2) At query time, sources and sinks are selected and edge costs cuv are assigned. (3) System 1 solves a min-cost flow and decomposes the optimum into evidence paths {πk}. (4) A retrieval-grounded gate triggers a single System 2 refinement pass when low answer consistency, flow saturation, or insufficient coverage indicates t… view at source ↗
Figure 2
Figure 2. Figure 2: RD path selection in FLOWREADER. Candidate paths {(πk, bk)} from min-cost flow decomposition are combined into a quality–diversity weight matrix A via composite quality qk and pairwise Jaccard non-overlap dkl. Entropy-regularised replicator dynamics on A prunes paths below θextinct, yielding a compact non-redundant subset that is passed to parallel VLM workers. Min-cost flow. With evidence budget F, the ro… view at source ↗
Figure 3
Figure 3. Figure 3: (a) Quality–diversity weight matrix A over K flow-decomposed paths (sorted by qk): diagonal entries encode path quality (ωkk = q 2 k ), off-diagonals encode quality-weighted Jaccard non-overlap (ωkl = √qkql dkl); entropy-regularised replicator updates concentrate the converged population x ⋆ on the support Π⋆ (|Π⋆ | = 11) above θextinct. (b) System 2 accuracy gain vs. trigger rate: SciGraphQA and FetaTab (… view at source ↗
read the original abstract

Long, multimodal documents force retrieval-augmented systems to assemble answers from evidence fragmented across text, tables, and slides broken across cells in a long table, spread over multiple slides, or split between a figure and its discussion. Top-$k$ chunk retrieval treats each fragment independently and cannot represent how evidence connects. We introduce FLOWREADER, which reframes evidence assembly as a min-cost flow problem on a multimodal node graph: a single scoring vector $h$ controls source selection (via MMR), sink selection (via a length-aware answerability proxy), and the costs and capacities of every edge. The optimal flow is decomposed into candidate evidence paths, a compact non-redundant subset is selected by entropy-regularized replicator dynamics, and parallel VLM workers under a dual-process gate produce the answer with a single System-2 refinement pass triggered when answer consistency is low or the routed flow is strained. On VisDoMBench, FLOWREADER is best on the two subsets dominated by fragmented evidence PaperTab ($58.40$, $+1.30$ over G^{2}-Reader) and SlideVQA ($72.93$, $+0.62$) and competitive on SPIQA, FetaTab, and SciGraphQA. Macro-averaged across all five subsets, FLOWREADER ($65.47$) is within $0.74$ of the strongest baseline (G^{2}-Reader, $66.21$). Overall, these results show that min-cost flow performs well on fragmented multimodal evidence, where top-$k$ retrieval fails. It also provides a unified way to control scoring, routing, selection, and adaptive compute together.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper introduces FLOWREADER, which models evidence assembly for long multimodal documents as a min-cost flow problem on a multimodal node graph. A single scoring vector h jointly controls source selection via MMR, sink selection via length-aware answerability, and all edge costs/capacities. Optimal flow is decomposed into paths, a subset selected via entropy-regularized replicator dynamics, and answers generated by parallel VLM workers with a dual-process gate for refinement. On VisDoMBench, it reports best results on PaperTab (58.40, +1.30 over G²-Reader) and SlideVQA (72.93, +0.62), competitive elsewhere, with macro-average 65.47 within 0.74 of the top baseline.

Significance. If the joint parameterization of the flow network by h is shown to be stable and the flow mechanism demonstrably responsible for gains on fragmented-evidence subsets, the work would provide a unified optimization framework for retrieval, diversity, connectivity, and adaptive computation in multimodal RAG, addressing a clear limitation of independent top-k chunking.

major comments (2)
  1. [Abstract] Abstract: The central claim attributes the +1.30 and +0.62 gains on PaperTab and SlideVQA specifically to min-cost flow whose source (MMR), sink (length-aware answerability), and all edge costs/capacities are controlled by one vector h. No equations, loss, or optimization procedure are supplied showing that a single h can satisfy these constraints simultaneously without unacceptable trade-offs (e.g., MMR-driven diversity inflating cross-modal edge costs). This joint-control assumption is load-bearing for the reported improvements over top-k baselines.
  2. [Abstract] Abstract: The manuscript states that flow decomposition plus replicator selection yields useful evidence paths on fragmented subsets, yet supplies neither the explicit capacity/cost definitions in terms of h nor any ablation isolating the flow component from the dual-process gate or VLM workers. Without these, it is impossible to verify that the optimization, rather than other modules, drives the subset-specific gains.
minor comments (2)
  1. [Abstract] Abstract: Benchmark deltas are reported without error bars, number of runs, or statistical significance tests, which is standard for claiming superiority on specific subsets.
  2. [Abstract] Abstract: The phrase 'parameter-free' is not used, but the claim of unified control by a single h would benefit from explicit statement of whether h is learned or hand-tuned and how its dimensionality relates to the graph size.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive and detailed feedback. We address the two major comments point by point below and will revise the manuscript accordingly to supply the requested technical details.

read point-by-point responses
  1. Referee: [Abstract] Abstract: The central claim attributes the +1.30 and +0.62 gains on PaperTab and SlideVQA specifically to min-cost flow whose source (MMR), sink (length-aware answerability), and all edge costs/capacities are controlled by one vector h. No equations, loss, or optimization procedure are supplied showing that a single h can satisfy these constraints simultaneously without unacceptable trade-offs (e.g., MMR-driven diversity inflating cross-modal edge costs). This joint-control assumption is load-bearing for the reported improvements over top-k baselines.

    Authors: We agree that the abstract does not contain equations or the optimization procedure. The abstract is space-constrained, but the joint-control claim requires clearer support. In revision we will expand the abstract (or add a short methods summary) to state how the single vector h simultaneously parameterizes MMR source scores, length-aware sink scores, and all edge costs/capacities, and we will briefly describe the min-cost flow objective and solver used. We will also add a short paragraph discussing why the formulation avoids unacceptable trade-offs (the global cost minimization couples the terms rather than treating them independently). revision: yes

  2. Referee: [Abstract] Abstract: The manuscript states that flow decomposition plus replicator selection yields useful evidence paths on fragmented subsets, yet supplies neither the explicit capacity/cost definitions in terms of h nor any ablation isolating the flow component from the dual-process gate or VLM workers. Without these, it is impossible to verify that the optimization, rather than other modules, drives the subset-specific gains.

    Authors: We accept that explicit capacity/cost definitions in terms of h and an isolating ablation are not present at the level of detail needed. In the revision we will (1) state the precise functional forms mapping h to edge capacities and costs, and (2) add a targeted ablation that removes or replaces the min-cost flow module while keeping the dual-process gate and VLM workers fixed, reporting results on the fragmented-evidence subsets. This will allow direct verification of the flow component's contribution. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical method with external optimization and benchmark evaluation

full rationale

The paper presents FLOWREADER as a min-cost flow formulation on a multimodal graph controlled by scoring vector h, with flow decomposition and replicator selection, evaluated empirically on VisDoMBench subsets. No equations, self-citations, or derivations are provided that reduce the method definition or performance claims to fitted inputs or self-referential quantities by construction. The central results are comparative scores against external baselines rather than tautological predictions.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Only the abstract is available, so free parameters, axioms, and invented entities cannot be exhaustively audited. The scoring vector h appears central but its training or fitting procedure is not specified.

pith-pipeline@v0.9.1-grok · 5834 in / 1255 out tokens · 25609 ms · 2026-06-27T20:45:48.976728+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

108 extracted references · 1 canonical work pages

  1. [1]

    Procedures for synthesizing ratio judgements.Journal of mathematical Psychology, 27(1):93–102, 1983

    Janos Aczél and Thomas L Saaty. Procedures for synthesizing ratio judgements.Journal of mathematical Psychology, 27(1):93–102, 1983

  2. [2]

    Large language models hallucination: A comprehensive survey.Computer Science Review, 61:100970, 2026

    Aisha Alansari and Hamzah Luqman. Large language models hallucination: A comprehensive survey.Computer Science Review, 61:100970, 2026

  3. [3]

    Entropy driven transformations of statistical hypersurfaces.Reviews in Mathematical Physics, 33(02):2150001, 2021

    Mario Angelelli and Boris Konopelchenko. Entropy driven transformations of statistical hypersurfaces.Reviews in Mathematical Physics, 33(02):2150001, 2021

  4. [4]

    Self-rag: Learning to retrieve, gener- ate, and critique through self-reflection

    Akari Asai, Zeqiu Wu, Yizhong Wang, Avirup Sil, and Hannaneh Hajishirzi. Self-rag: Learning to retrieve, gener- ate, and critique through self-reflection. InThe Twelfth International Conference on Learning Representations, 2023

  5. [5]

    Injecting the bm25 score as text improves bert-based re-rankers

    Arian Askari, Amin Abolghasemi, Gabriella Pasi, Wessel Kraaij, and Suzan Verberne. Injecting the bm25 score as text improves bert-based re-rankers. InEuropean Conference on Information Retrieval, pages 66–83. Springer, 2023. 4https://www.hpc.cineca.it 9 FLOWREADER: Min-Cost Flow Optimization for Multi-Modal Long Document Q&AA PREPRINT

  6. [6]

    Relative entropy in biological systems.Entropy, 18(2):46, 2016

    John C Baez and Blake S Pollard. Relative entropy in biological systems.Entropy, 18(2):46, 2016

  7. [7]

    Qwen3-vl technical report.arXiv preprint arXiv:2511.21631, 2025

    Shuai Bai, Yuxuan Cai, Ruizhe Chen, Keqin Chen, Xionghui Chen, Zesen Cheng, Lianghao Deng, Wei Ding, Chang Gao, Chunjiang Ge, et al. Qwen3-vl technical report.arXiv preprint arXiv:2511.21631, 2025

  8. [8]

    Approximating the maximum weight clique using replicator dynamics.IEEE Transactions on neural networks, 11(6):1228–1241, 2000

    IR Bomze, Marcello Pelillo, and V olker Stix. Approximating the maximum weight clique using replicator dynamics.IEEE Transactions on neural networks, 11(6):1228–1241, 2000

  9. [9]

    The use of mmr, diversity-based reranking for reordering documents and producing summaries

    Jaime Carbonell and Jade Goldstein. The use of mmr, diversity-based reranking for reordering documents and producing summaries. InProceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval, pages 335–336, 1998

  10. [10]

    Dualrag: A dual-process approach to integrate reasoning and retrieval for multi-hop question answering

    Rong Cheng, Jinyi Liu, Yan Zheng, Fei Ni, Jiazhen Du, Hangyu Mao, Fuzheng Zhang, Bo Wang, and Jianye Hao. Dualrag: A dual-process approach to integrate reasoning and retrieval for multi-hop question answering. InProceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 31877–31899, 2025

  11. [11]

    Mem0: Building production- ready ai agents with scalable long-term memory.arXiv preprint arXiv:2504.19413, 2025

    Prateek Chhikara, Dev Khant, Saket Aryan, Taranjeet Singh, and Deshraj Yadav. Mem0: Building production- ready ai agents with scalable long-term memory.arXiv preprint arXiv:2504.19413, 2025

  12. [12]

    M3docvqa: Multi-modal multi-page multi-document understanding

    Jaemin Cho, Debanjan Mahata, Ozan Irsoy, Yujie He, and Mohit Bansal. M3docvqa: Multi-modal multi-page multi-document understanding. InProceedings of the IEEE/CVF International Conference on Computer Vision, pages 6178–6188, 2025

  13. [13]

    Improving the similarity measure of determinantal point processes for extractive multi-document summarization

    Sangwoo Cho, Logan Lebanoff, Hassan Foroosh, and Fei Liu. Improving the similarity measure of determinantal point processes for extractive multi-document summarization. InProceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 1027–1038, 2019

  14. [14]

    Reducing hallucinations of medical multimodal large language models with visual retrieval-augmented generation.arXiv preprint arXiv:2502.15040, 2025

    Yun-Wei Chu, Kai Zhang, Christopher Malon, and Martin Renqiang Min. Reducing hallucinations of medical multimodal large language models with visual retrieval-augmented generation.arXiv preprint arXiv:2502.15040, 2025

  15. [15]

    Sinkhorn distances: Lightspeed computation of optimal transport.Advances in neural information processing systems, 26, 2013

    Marco Cuturi. Sinkhorn distances: Lightspeed computation of optimal transport.Advances in neural information processing systems, 26, 2013

  16. [16]

    A dataset of information- seeking questions and answers anchored in research papers

    Pradeep Dasigi, Kyle Lo, Iz Beltagy, Arman Cohan, Noah A Smith, and Matt Gardner. A dataset of information- seeking questions and answers anchored in research papers. InProceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 4599–4610, 2021

  17. [17]

    Mc-indexing: Effective long document retrieval via multi-view content-aware indexing

    Kuicai Dong, Derrick Goh Xin Deik, Yi Quan Lee, Hao Zhang, Xiangyang Li, Cong Zhang, and Yong Liu. Mc-indexing: Effective long document retrieval via multi-view content-aware indexing. InFindings of the Association for Computational Linguistics: EMNLP 2024, pages 2673–2691, 2024

  18. [18]

    g2-Reader: Dual Evolving Graphs for Multimodal Document Comprehension.arXiv preprint arXiv:2601.22055, 2026

    Yaxin Du, Junru Song, Yifan Zhou, Cheng Wang, Jiahao Gu, Zimeng Chen, Menglan Chen, Wen Yao, Yang Yang, Ying Wen, et al. g2-Reader: Dual Evolving Graphs for Multimodal Document Comprehension.arXiv preprint arXiv:2601.22055, 2026

  19. [19]

    Meaningful environmental indices: a social choice approach.Journal of Environ- mental Economics and Management, 47(2):270–283, 2004

    Udo Ebert and Heinz Welsch. Meaningful environmental indices: a social choice approach.Journal of Environ- mental Economics and Management, 47(2):270–283, 2004

  20. [20]

    From local to global: A graph rag approach to query-focused summarization.arXiv preprint arXiv:2404.16130, 2024

    Darren Edge, Ha Trinh, Newman Cheng, Joshua Bradley, Alex Chao, Apurva Mody, Steven Truitt, Dasha Metropolitansky, Robert Osazuwa Ness, and Jonathan Larson. From local to global: A graph rag approach to query-focused summarization.arXiv preprint arXiv:2404.16130, 2024

  21. [21]

    Colpali: Efficient document retrieval with vision language models.arXiv preprint arXiv:2407.01449, 2024

    Manuel Faysse, Hugues Sibille, Tony Wu, Bilel Omrani, Gautier Viaud, Céline Hudelot, and Pierre Colombo. Colpali: Efficient document retrieval with vision language models.arXiv preprint arXiv:2407.01449, 2024

  22. [22]

    A scalable global model for summarization

    Dan Gillick and Benoit Favre. A scalable global model for summarization. InProceedings of the workshop on integer linear programming for natural language processing, pages 10–18, 2009

  23. [23]

    A decomposition theorem for dynamic flows.arXiv preprint arXiv:2407.04761, 2024

    Lukas Graf, Tobias Harks, and Julian Schwarz. A decomposition theorem for dynamic flows.arXiv preprint arXiv:2407.04761, 2024

  24. [24]

    Dior: Adaptive cognitive detection and contextual retrieval optimization for dynamic retrieval-augmented generation

    Hanghui Guo, Jia Zhu, Shimin Di, Weijie Shi, Zhangze Chen, and Jiajie Xu. Dior: Adaptive cognitive detection and contextual retrieval optimization for dynamic retrieval-augmented generation. InProceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 2953–2975, 2025

  25. [25]

    Lightrag: Simple and fast retrieval-augmented generation.arXiv preprint arXiv:2410.05779, 2(3), 2024

    Zirui Guo, Lianghao Xia, Yanhua Yu, Tian Ao, and Chao Huang. Lightrag: Simple and fast retrieval-augmented generation.arXiv preprint arXiv:2410.05779, 2(3), 2024. 10 FLOWREADER: Min-Cost Flow Optimization for Multi-Modal Long Document Q&AA PREPRINT

  26. [26]

    Rag-anything: All-in-one rag framework

    Zirui Guo, Xubin Ren, Lingrui Xu, Jiahao Zhang, and Chao Huang. Rag-anything: All-in-one rag framework. arXiv preprint arXiv:2510.12323, 2025

  27. [27]

    Hipporag: Neurobiologically inspired long-term memory for large language models.Advances in neural information processing systems, 37:59532– 59569, 2024

    Bernal J Gutiérrez, Yiheng Shu, Yu Gu, Michihiro Yasunaga, and Yu Su. Hipporag: Neurobiologically inspired long-term memory for large language models.Advances in neural information processing systems, 37:59532– 59569, 2024

  28. [28]

    Topic-sensitive pagerank

    Taher H Haveliwala. Topic-sensitive pagerank. InProceedings of the 11th international conference on World Wide Web, pages 517–526, 2002

  29. [29]

    Soft quality-diversity optimization.arXiv preprint arXiv:2512.00810, 2025

    Saeed Hedayatian and Stefanos Nikolaidis. Soft quality-diversity optimization.arXiv preprint arXiv:2512.00810, 2025

  30. [30]

    Enhancing the precision and interpretability of retrieval-augmented generation (rag) in legal technology: A survey.IEEE Access, 2025

    Mahd Hindi, Linda Mohammed, Ommama Maaz, and Abdulmalik Alwarafy. Enhancing the precision and interpretability of retrieval-augmented generation (rag) in legal technology: A survey.IEEE Access, 2025

  31. [31]

    Cog-rag: Cognitive-inspired dual-hypergraph with theme alignment retrieval-augmented generation

    Hao Hu, Yifan Feng, Ruoxue Li, Rundong Xue, Xingliang Hou, Zhiqiang Tian, Yue Gao, and Shaoyi Du. Cog-rag: Cognitive-inspired dual-hypergraph with theme alignment retrieval-augmented generation. InProceedings of the AAAI Conference on Artificial Intelligence, volume 40, pages 31032–31040, 2026

  32. [32]

    Uda: A benchmark suite for retrieval augmented generation in real-world document analysis.Advances in Neural Information Processing Systems, 37:67200–67217, 2024

    Yulong Hui, Yao Lu, and Huanchen Zhang. Uda: A benchmark suite for retrieval augmented generation in real-world document analysis.Advances in Neural Information Processing Systems, 37:67200–67217, 2024

  33. [33]

    Adaptive-rag: Learning to adapt retrieval-augmented large language models through question complexity

    Soyeong Jeong, Jinheon Baek, Sukmin Cho, Sung Ju Hwang, and Jong C Park. Adaptive-rag: Learning to adapt retrieval-augmented large language models through question complexity. InProceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers), pages 7036–...

  34. [34]

    Active retrieval augmented generation

    Zhengbao Jiang, Frank F Xu, Luyu Gao, Zhiqing Sun, Qian Liu, Jane Dwivedi-Yu, Yiming Yang, Jamie Callan, and Graham Neubig. Active retrieval augmented generation. InProceedings of the 2023 conference on empirical methods in natural language processing, pages 7969–7992, 2023

  35. [35]

    Determinantal point processes for machine learning.Foundations and Trends® in Machine Learning, 5(2-3):123–286, 2012

    Alex Kulesza and Ben Taskar. Determinantal point processes for machine learning.Foundations and Trends® in Machine Learning, 5(2-3):123–286, 2012

  36. [36]

    From word embeddings to document distances

    Matt Kusner, Yu Sun, Nicholas Kolkin, and Kilian Weinberger. From word embeddings to document distances. In International conference on machine learning, pages 957–966. PMLR, 2015

  37. [37]

    Retrieval-augmented generation for knowledge- intensive nlp tasks.Advances in neural information processing systems, 33:9459–9474, 2020

    Patrick Lewis, Ethan Perez, Aleksandra Piktus, Fabio Petroni, Vladimir Karpukhin, Naman Goyal, Heinrich Küttler, Mike Lewis, Wen-tau Yih, Tim Rocktäschel, et al. Retrieval-augmented generation for knowledge- intensive nlp tasks.Advances in neural information processing systems, 33:9459–9474, 2020

  38. [38]

    Smart-rag: Selection using determinantal matrices for augmented retrieval

    Jiatao Li, Xinyu Hu, and Xiaojun Wan. Smart-rag: Selection using determinantal matrices for augmented retrieval. arXiv preprint arXiv:2409.13992, 2024

  39. [39]

    Graphreader: Building graph-based agent to enhance long-context abilities of large language models

    Shilong Li, Yancheng He, Hangyu Guo, Xingyuan Bu, Ge Bai, Jie Liu, Jiaheng Liu, Xingwei Qu, Yangguang Li, Wanli Ouyang, et al. Graphreader: Building graph-based agent to enhance long-context abilities of large language models. InFindings of the Association for Computational Linguistics: EMNLP 2024, pages 12758–12786, 2024

  40. [40]

    Reasoning rag via system 1 or system 2: A survey on reasoning agentic retrieval-augmented generation for industry challenges

    Jintao Liang, Huifeng Lin, You Wu, Rui Zhao, Ziyue Li, et al. Reasoning rag via system 1 or system 2: A survey on reasoning agentic retrieval-augmented generation for industry challenges. InProceedings of the 14th International Joint Conference on Natural Language Processing and the 4th Conference of the Asia-Pacific Chapter of the Association for Computa...

  41. [41]

    Multi-document summarization via budgeted maximization of submodular functions

    Hui Lin and Jeff Bilmes. Multi-document summarization via budgeted maximization of submodular functions. In Human Language Technologies: The 2010 Annual conference of the North American chapter of the association for computational linguistics, pages 912–920, 2010

  42. [42]

    A class of submodular functions for document summarization

    Hui Lin and Jeff Bilmes. A class of submodular functions for document summarization. InProceedings of the 49th annual meeting of the association for computational linguistics: human language technologies, pages 510–520, 2011

  43. [43]

    A study of global inference algorithms in multi-document summarization

    Ryan McDonald. A study of global inference algorithms in multi-document summarization. InEuropean conference on information retrieval, pages 557–564. Springer, 2007

  44. [44]

    From ranking to selection: A simple but efficient dynamic passage selector for retrieval augmented generation

    Siyuan Meng, Junming Liu, Yirong Chen, Song Mao, Pinlong Cai, Guohang Yan, Botian Shi, and Ding Wang. From ranking to selection: A simple but efficient dynamic passage selector for retrieval augmented generation. arXiv preprint arXiv:2508.09497, 2025

  45. [45]

    Fetaqa: Free-form table question answering.Transactions of the Association for Computational Linguistics, 10:35–49, 2022

    Linyong Nan, Chiachun Hsieh, Ziming Mao, Xi Victoria Lin, Neha Verma, Rui Zhang, Wojciech Kry ´sci´nski, Hailey Schoelkopf, Riley Kong, Xiangru Tang, et al. Fetaqa: Free-form table question answering.Transactions of the Association for Computational Linguistics, 10:35–49, 2022. 11 FLOWREADER: Min-Cost Flow Optimization for Multi-Modal Long Document Q&AA PREPRINT

  46. [46]

    Ma-rag: Multi-agent retrieval-augmented generation via collabora- tive chain-of-thought reasoning.arXiv preprint arXiv:2505.20096, 2025

    Thang Nguyen, Peter Chin, and Yu-Wing Tai. Ma-rag: Multi-agent retrieval-augmented generation via collabora- tive chain-of-thought reasoning.arXiv preprint arXiv:2505.20096, 2025

  47. [47]

    Morris, Brandon Duderstadt, and Andriy Mulyar

    Zach Nussbaum, John X. Morris, Brandon Duderstadt, and Andriy Mulyar. Nomic embed: Training a reproducible long context text embedder, 2024

  48. [48]

    Dominant sets and pairwise clustering.IEEE Transactions on Pattern Analysis and Machine Intelligence, 29(1):167–172, 2007

    Massimiliano Pavan and Marcello Pelillo. Dominant sets and pairwise clustering.IEEE Transactions on Pattern Analysis and Machine Intelligence, 29(1):167–172, 2007. doi: 10.1109/TPAMI.2007.250608

  49. [49]

    Replicator equations, maximal cliques, and graph isomorphism.Advances in Neural Information Processing Systems, 11, 1998

    Marcello Pelillo. Replicator equations, maximal cliques, and graph isomorphism.Advances in Neural Information Processing Systems, 11, 1998

  50. [50]

    Spiqa: A dataset for multimodal question answering on scientific papers.Advances in Neural Information Processing Systems, 37:118807–118833, 2024

    Shraman Pramanick, Rama Chellappa, and Subhashini Venugopalan. Spiqa: A dataset for multimodal question answering on scientific papers.Advances in Neural Information Processing Systems, 37:118807–118833, 2024

  51. [51]

    Measuring and narrowing the compositionality gap in language models

    Ofir Press, Muru Zhang, Sewon Min, Ludwig Schmidt, Noah A Smith, and Mike Lewis. Measuring and narrowing the compositionality gap in language models. InFindings of the Association for Computational Linguistics: EMNLP 2023, pages 5687–5711, 2023

  52. [52]

    Pairwise interactions origin of entropy functions.arXiv preprint arXiv:1506.05731, 2015

    Yuri Pykh. Pairwise interactions origin of entropy functions.arXiv preprint arXiv:1506.05731, 2015

  53. [53]

    Zep: a temporal knowledge graph architecture for agent memory.arXiv preprint arXiv:2501.13956, 2025

    Preston Rasmussen, Pavlo Paliychuk, Travis Beauvais, Jack Ryan, and Daniel Chalef. Zep: a temporal knowledge graph architecture for agent memory.arXiv preprint arXiv:2501.13956, 2025

  54. [54]

    Some simple effective approximations to the 2-poisson model for probabilistic weighted retrieval

    Stephen E Robertson and Steve Walker. Some simple effective approximations to the 2-poisson model for probabilistic weighted retrieval. InSIGIR’94: Proceedings of the Seventeenth Annual International ACM-SIGIR Conference on Research and Development in Information Retrieval, organised by Dublin City University, pages 232–241. Springer, 1994

  55. [55]

    Motor: Multimodal optimal transport via grounded retrieval in medical visual question answering

    Mai A Shaaban, Tausifa Jan Saleem, Vijay Ram Kumar Papineni, and Mohammad Yaqub. Motor: Multimodal optimal transport via grounded retrieval in medical visual question answering. InInternational Conference on Medical Image Computing and Computer-Assisted Intervention, pages 459–469. Springer, 2025

  56. [56]

    Enhancing retrieval- augmented large language models with iterative retrieval-generation synergy

    Zhihong Shao, Yeyun Gong, Yelong Shen, Minlie Huang, Nan Duan, and Weizhu Chen. Enhancing retrieval- augmented large language models with iterative retrieval-generation synergy. InFindings of the Association for Computational Linguistics: EMNLP 2023, pages 9248–9274, 2023

  57. [57]

    Openai gpt-5 system card.arXiv preprint arXiv:2601.03267, 2025

    Aaditya Singh, Adam Fry, Adam Perelman, Adam Tart, Adi Ganesh, Ahmed El-Kishky, Aidan McLaughlin, Aiden Low, AJ Ostrow, Akhila Ananthram, et al. Openai gpt-5 system card.arXiv preprint arXiv:2601.03267, 2025

  58. [58]

    Large language model reasoning failures.Transactions on Machine Learning Research

    Peiyang Song, Pengrui Han, and Noah Goodman. Large language model reasoning failures.Transactions on Machine Learning Research

  59. [59]

    A robust optimization approach to flow decomposi- tion.arXiv preprint arXiv:2410.21140, 2024

    Moritz Stinzendörfer, Philine Schiewe, and Fabricio Oliveira. A robust optimization approach to flow decomposi- tion.arXiv preprint arXiv:2410.21140, 2024

  60. [60]

    The multi-round diagnostic rag framework for emulating clinical reasoning.arXiv preprint arXiv:2504.07724, 2025

    Penglei Sun, Yixiang Chen, Xiang Li, and Xiaowen Chu. The multi-round diagnostic rag framework for emulating clinical reasoning.arXiv preprint arXiv:2504.07724, 2025

  61. [61]

    Visdom: Multi-document qa with visually rich elements using multimodal retrieval-augmented generation

    Manan Suri, Puneet Mathur, Franck Dernoncourt, Kanika Goswami, Ryan A Rossi, and Dinesh Manocha. Visdom: Multi-document qa with visually rich elements using multimodal retrieval-augmented generation. InProceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Vo...

  62. [62]

    Slidevqa: A dataset for document visual question answering on multiple images

    Ryota Tanaka, Kyosuke Nishida, Kosuke Nishida, Taku Hasegawa, Itsumi Saito, and Kuniko Saito. Slidevqa: A dataset for document visual question answering on multiple images. InProceedings of the AAAI Conference on Artificial Intelligence, volume 37, pages 13636–13645, 2023

  63. [63]

    Otextsum: Extractive text summarisa- tion with optimal transport

    Peggy Tang, Kun Hu, Rui Yan, Lei Zhang, Junbin Gao, and Zhiyong Wang. Otextsum: Extractive text summarisa- tion with optimal transport. InFindings of the Association for Computational Linguistics: NAACL 2022, pages 1128–1141, 2022

  64. [64]

    Interleaving retrieval with chain-of-thought reasoning for knowledge-intensive multi-step questions

    Harsh Trivedi, Niranjan Balasubramanian, Tushar Khot, and Ashish Sabharwal. Interleaving retrieval with chain-of-thought reasoning for knowledge-intensive multi-step questions. InProceedings of the 61st annual meeting of the association for computational linguistics (volume 1: long papers), pages 10014–10037, 2023

  65. [65]

    Mmgraphrag: Bridging vision and language with interpretable multimodal knowledge graphs.arXiv preprint arXiv:2507.20804, 2025

    Xueyao Wan and Hang Yu. Mmgraphrag: Bridging vision and language with interpretable multimodal knowledge graphs.arXiv preprint arXiv:2507.20804, 2025

  66. [66]

    Mineru: An open-source solution for precise document content extraction.arXiv preprint arXiv:2409.18839, 2024

    Bin Wang, Chao Xu, Xiaomeng Zhao, Linke Ouyang, Fan Wu, Zhiyuan Zhao, Rui Xu, Kaiwen Liu, Yuan Qu, Fukai Shang, et al. Mineru: An open-source solution for precise document content extraction.arXiv preprint arXiv:2409.18839, 2024. 12 FLOWREADER: Min-Cost Flow Optimization for Multi-Modal Long Document Q&AA PREPRINT

  67. [67]

    Vidorag: Visual document retrieval-augmented generation via dynamic iterative reasoning agents

    Qiuchen Wang, Ruixue Ding, Zehui Chen, Weiqi Wu, Shihang Wang, Pengjun Xie, and Feng Zhao. Vidorag: Visual document retrieval-augmented generation via dynamic iterative reasoning agents. InProceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, pages 9124–9145, 2025

  68. [68]

    Document segmentation matters for retrieval-augmented generation

    Zhitong Wang, Cheng Gao, Chaojun Xiao, Yufei Huang, Shuzheng Si, Kangyang Luo, Yuzhuo Bai, Wenhao Li, Tangjian Duan, Chuancheng Lv, et al. Document segmentation matters for retrieval-augmented generation. In Findings of the Association for Computational Linguistics: ACL 2025, pages 8063–8075, 2025

  69. [69]

    Speculative rag: Enhancing retrieval augmented generation through drafting.arXiv preprint arXiv:2407.08223, 2024

    Zilong Wang, Zifeng Wang, Long Le, Huaixiu Steven Zheng, Swaroop Mishra, Vincent Perot, Yuwei Zhang, Anush Mattapalli, Ankur Taly, Jingbo Shang, et al. Speculative rag: Enhancing retrieval augmented generation through drafting.arXiv preprint arXiv:2407.08223, 2024

  70. [70]

    Deepseek-ocr: Contexts optical compression.arXiv preprint arXiv:2510.18234, 2025

    Haoran Wei, Yaofeng Sun, and Yukun Li. Deepseek-ocr: Contexts optical compression.arXiv preprint arXiv:2510.18234, 2025

  71. [71]

    Weibull.Evolutionary Game Theory

    Jörgen W. Weibull.Evolutionary Game Theory. MIT Press, Cambridge, MA, 1995

  72. [72]

    Mmed-rag: Versatile multimodal rag system for medical vision language models.arXiv preprint arXiv:2410.13085, 2024

    Peng Xia, Kangyu Zhu, Haoran Li, Tianze Wang, Weijia Shi, Sheng Wang, Linjun Zhang, James Zou, and Huaxiu Yao. Mmed-rag: Versatile multimodal rag system for medical vision language models.arXiv preprint arXiv:2410.13085, 2024

  73. [73]

    Knowledge conflicts for llms: A survey

    Rongwu Xu, Zehan Qi, Zhijiang Guo, Cunxiang Wang, Hongru Wang, Yue Zhang, and Wei Xu. Knowledge conflicts for llms: A survey. InProceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, pages 8541–8565, 2024

  74. [74]

    A-mem: Agentic memory for llm agents.arXiv preprint arXiv:2502.12110, 2025

    Wujiang Xu, Zujie Liang, Kai Mei, Hang Gao, Juntao Tan, and Yongfeng Zhang. A-mem: Agentic memory for llm agents.arXiv preprint arXiv:2502.12110, 2025

  75. [75]

    Efficient algorithms for personalized pagerank computation: A survey.IEEE Transactions on Knowledge and Data Engineering, 36(9):4582–4602, 2024

    Mingji Yang, Hanzhi Wang, Zhewei Wei, Sibo Wang, and Ji-Rong Wen. Efficient algorithms for personalized pagerank computation: A survey.IEEE Transactions on Knowledge and Data Engineering, 36(9):4582–4602, 2024

  76. [76]

    React: Synergizing reasoning and acting in language models

    Shunyu Yao, Jeffrey Zhao, Dian Yu, Nan Du, Izhak Shafran, Karthik R Narasimhan, and Yuan Cao. React: Synergizing reasoning and acting in language models. InThe eleventh international conference on learning representations

  77. [77]

    Visrag: Vision-based retrieval-augmented generation on multi-modality documents

    Shi Yu, Chaoyue Tang, Bokai Xu, Junbo Cui, Junhao Ran, Yukun Yan, Zhenghao Liu, Shuo Wang, Xu Han, Zhiyuan Liu, et al. Visrag: Vision-based retrieval-augmented generation on multi-modality documents. InThe Thirteenth International Conference on Learning Representations

  78. [78]

    Auto-rag: Autonomous retrieval-augmented generation for large language models.arXiv preprint arXiv:2411.19443, 2024

    Tian Yu, Shaolei Zhang, and Yang Feng. Auto-rag: Autonomous retrieval-augmented generation for large language models.arXiv preprint arXiv:2411.19443, 2024

  79. [79]

    keywords:

    Hamed Zamani and Michael Bendersky. Stochastic rag: End-to-end retrieval-augmented generation through expected utility maximization. InProceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 2641–2646, 2024. 13 FLOWREADER: Min-Cost Flow Optimization for Multi-Modal Long Document Q&AA PREPRINT ...

  80. [80]

    Extract EVERY entity present -- coverage must be exhaustive

Showing first 80 references.