SproutRAG: Attention-Guided Tree Search with Progressive Embeddings for Long-Document RAG

Amirhossein Abaskohi; Giuseppe Carenini; Issam H. Laradji; Peter West

arxiv: 2606.18381 · v1 · pith:UCLPRJD6new · submitted 2026-06-16 · 💻 cs.CL · cs.IR

SproutRAG: Attention-Guided Tree Search with Progressive Embeddings for Long-Document RAG

Amirhossein Abaskohi , Issam H. Laradji , Peter West , Giuseppe Carenini This is my paper

Pith reviewed 2026-06-27 00:47 UTC · model grok-4.3

classification 💻 cs.CL cs.IR

keywords RAGhierarchical retrievalattention-guided chunkingbinary treemulti-granularitylong document retrievalinformation efficiencyprogressive embeddings

0 comments

The pith

SproutRAG uses learned attention to build a binary chunking tree for multi-granularity retrieval in long documents.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents SproutRAG as a way to organize sentence chunks into a hierarchical binary tree based on attention patterns from the embedding model. This structure supports retrieving information at different levels of detail while preserving semantic connections. It avoids depending on expensive LLM calls for chunking or losing details through summarization. The method trains the embeddings and tree together. Tests on four benchmarks show an average 6.1 percent gain in information efficiency over the best prior approach.

Core claim

SproutRAG learns inter-sentence attention to construct a binary chunking tree that groups sentences into progressively larger coherent units, then applies hierarchical beam search at retrieval time to pull candidates from multiple granularities, all without extra LLM calls or compressed summaries, yielding 6.1 percent higher information efficiency on average.

What carries the argument

The attention-guided binary chunking tree, built by selecting the best attention heads and layers to capture document structure and trained jointly with the embeddings.

If this is right

Multi-granularity retrieval becomes possible without single-level limits or information loss.
Indexing and retrieval avoid costly LLM calls required by prior chunking methods.
End-to-end training improves both the quality of embeddings and the usefulness of the tree structure.
The approach applies to scientific, legal, and open-domain documents with consistent gains.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The tree could enable more efficient indexing for very long documents by reusing the hierarchical structure.
Attention patterns might transfer across different embedding models if the selection process generalizes.
Combining this with other RAG techniques like query expansion could compound the efficiency gains.

Load-bearing premise

Learned attention patterns from the embedding model can reliably form a binary tree of chunks that stays semantically coherent at every scale without needing external labels or causing information loss.

What would settle it

Experiments on the same benchmarks that find no gain in information efficiency or produce trees where adjacent chunks lack clear semantic links when checked manually.

Figures

Figures reproduced from arXiv: 2606.18381 by Amirhossein Abaskohi, Giuseppe Carenini, Issam H. Laradji, Peter West.

**Figure 1.** Figure 1: SPROUTRAG segments a long document into sentence-level chunks, uses SLLM attention to identify semantically related sentences, and organizes them into an attention-guided binary tree. Retrieval then selects evidence across fine-grained leaves, mid-level nodes, and broader subtrees, preserving precision while recovering coherence. lations, but are less effective when fine-grained chunks contain sparse enti… view at source ↗

**Figure 2.** Figure 2: Overview of SPROUTRAG. In the offline indexing phase (Phase 1), documents are split into sentencelevel chunks and encoded with a SLLM to obtain sentence embeddings and inter-sentence attention. Learned aggregation over attention heads and layers guides bottom-up tree construction, producing an attention tree with sentence embeddings at the leaves and progressive embeddings at internal nodes. In the online… view at source ↗

read the original abstract

Retrieval-augmented generation (RAG) systems must balance retrieval granularity with contextual coherence, a challenge that existing methods address through LLM-guided chunking, single-level context expansion, or hierarchical summarization. These approaches variously depend on costly LLM calls during indexing or retrieval, limit context aggregation to a single granularity level, or introduce information loss through summarization. We present SproutRAG, an attention-guided hierarchical RAG framework that addresses this trade-off by organizing sentence-level chunks into progressively larger but semantically coherent units, using learned inter-sentence attention to construct a binary chunking tree. Unlike prior approaches that rely on external LLMs, fixed context expansion, or lossy summarization, SproutRAG learns which attention heads and layers best capture semantic document structure, enabling multi-granularity retrieval without additional LLM calls or compressed summaries. At retrieval time, SproutRAG uses hierarchical beam search to retrieve candidates at multiple granularities, capturing multi-sentence relevance beyond flat retrieval. The framework is trained end-to-end with a joint objective that improves both embeddings and tree structure. Experiments across four benchmarks spanning scientific, legal, and open-domain settings demonstrate that SproutRAG improves information efficiency (IE) by 6.1% on average over the strongest baseline. Code is available on https://github.com/AmirAbaskohi/SproutRAG.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

SproutRAG's main move is using the embedding model's own attention to build a binary chunking tree for multi-granularity RAG without LLM calls at index time, but the abstract leaves it unclear whether the reported 6.1% IE gain actually depends on that tree or just the hierarchical search.

read the letter

The paper's core idea is to let inter-sentence attention inside a standard embedding model decide how to group sentences into a binary tree of progressively larger chunks. At query time it runs hierarchical beam search across those levels. That combination, plus the claim that it avoids both LLM chunking and summarization loss, is what they present as new.

What stands out is the attempt to get the hierarchy for free from the same model used for embeddings, trained end-to-end. If the attention patterns really do produce coherent multi-granularity units, it could cut indexing cost compared with methods that call an LLM for every document. The code release is also a plus for anyone who wants to test it.

The soft spot is that the abstract supplies almost no experimental detail. We get an average 6.1% IE improvement over the strongest baseline across four domains, but nothing on how baselines were implemented, whether statistical tests were run, or any ablation that swaps the learned tree for a fixed or random hierarchy. Without that, it is hard to know whether the attention-guided construction is load-bearing or whether the gain comes mainly from the beam search itself. The stress-test concern about coherence is reasonable on the current evidence; the paper asserts the tree stays lossless but does not show boundary alignment or reconstruction checks.

This is the kind of paper that would interest people already working on long-document RAG who are tired of LLM-dependent chunkers. A reader who wants a concrete alternative to single-level expansion or summarization might get something useful out of the method once the experiments are filled in.

I would send it to peer review. The approach is distinct enough from the baselines named in the abstract that referees can check whether the tree actually adds value beyond the search procedure.

Referee Report

3 major / 2 minor

Summary. The paper introduces SproutRAG, a hierarchical RAG framework that learns inter-sentence attention patterns from an embedding model to construct a binary chunking tree organizing sentence-level units into multi-granularity semantically coherent chunks. It performs end-to-end training with a joint objective on embeddings and tree structure, then applies hierarchical beam search at retrieval time to retrieve across granularities without LLM calls or summarization. The central empirical claim is a 6.1% average improvement in information efficiency (IE) over the strongest baseline across four benchmarks in scientific, legal, and open-domain settings.

Significance. If the central claim holds after verification, the work would demonstrate a parameter-efficient way to obtain multi-granularity retrieval by repurposing attention heads/layers already present in the embedding model, avoiding the cost of LLM-guided chunking or lossy summarization. The public release of code is a clear strength that supports reproducibility.

major comments (3)

[Abstract / Experiments] Abstract and Experiments section: the 6.1% average IE improvement is stated without any description of the experimental design (number of runs, statistical tests, baseline definitions, or variance), which is load-bearing for the claim that the attention-guided tree drives the gain rather than the hierarchical beam search alone.
[Abstract / §4] Abstract and §4 (method): the premise that learned attention produces a binary chunking tree preserving semantic coherence without external supervision or information loss is invoked to justify the framework, yet no ablation replaces the learned tree with fixed/random hierarchies or reports a direct coherence metric (e.g., boundary F1 or reconstruction fidelity) to confirm the tree quality is load-bearing.
[Experiments] Experiments: no results are shown isolating the contribution of the attention-guided tree construction versus the hierarchical beam search component; if the IE lift persists under a non-semantic hierarchy, the learned attention mechanism would not be necessary for the reported improvement.

minor comments (2)

[§3] Notation for the binary chunking tree and progressive embeddings should be introduced with a small diagram or explicit recursive definition early in §3 to aid readability.
[§4] The joint training objective is described at a high level; an explicit loss equation would clarify how the embedding and tree-structure terms interact.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for their thorough review and constructive comments. We address each major comment below and indicate the revisions we will make to strengthen the manuscript.

read point-by-point responses

Referee: [Abstract / Experiments] Abstract and Experiments section: the 6.1% average IE improvement is stated without any description of the experimental design (number of runs, statistical tests, baseline definitions, or variance), which is load-bearing for the claim that the attention-guided tree drives the gain rather than the hierarchical beam search alone.

Authors: We agree that additional details on the experimental design would improve clarity and support the central claim. In the revised manuscript, we will update the abstract to briefly mention the experimental setup, including that results are averaged over 5 runs with standard deviation reported, and that baselines are the strongest reported in prior work. We will also add a dedicated paragraph in the Experiments section summarizing the design, statistical tests (paired t-tests), and variance. This addresses the concern that the gain is attributable to the proposed method. revision: yes
Referee: [Abstract / §4] Abstract and §4 (method): the premise that learned attention produces a binary chunking tree preserving semantic coherence without external supervision or information loss is invoked to justify the framework, yet no ablation replaces the learned tree with fixed/random hierarchies or reports a direct coherence metric (e.g., boundary F1 or reconstruction fidelity) to confirm the tree quality is load-bearing.

Authors: This is a valid point. While the joint training objective is designed to optimize for semantic coherence, we acknowledge the lack of direct ablations. In the revision, we will add experiments ablating the learned tree against fixed binary hierarchies and random chunking, and report a coherence metric based on sentence boundary alignment with human-annotated sections where available. This will demonstrate that the attention-guided construction is load-bearing. revision: yes
Referee: [Experiments] Experiments: no results are shown isolating the contribution of the attention-guided tree construction versus the hierarchical beam search component; if the IE lift persists under a non-semantic hierarchy, the learned attention mechanism would not be necessary for the reported improvement.

Authors: We agree that isolating these components is important. The current experiments compare the full SproutRAG against baselines, but to directly address this, we will include new results in the revised version showing performance with the hierarchical beam search but using non-learned (fixed) hierarchies. This will help confirm the necessity of the attention-guided tree. revision: yes

Circularity Check

0 steps flagged

No circularity: performance measured on external benchmarks with no self-referential derivations

full rationale

The paper describes an end-to-end trained framework evaluated via information efficiency on four external benchmarks (scientific, legal, open-domain). No equations, derivations, or self-citations are presented that reduce the claimed 6.1% IE gain to a quantity fitted or defined by the method itself. The tree construction and joint objective are described as learned components, but the reported results are positioned as measured outcomes rather than tautological outputs of the training process. This is the standard non-circular case for an empirical systems paper.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 1 invented entities

The central claim rests on the domain assumption that attention patterns capture semantic structure and on the introduction of a new binary tree organization; no free parameters or external evidence for the invented structure are described in the abstract.

free parameters (1)

selection of attention heads and layers
The method learns which heads and layers to use, implying a selection process whose details are not specified in the abstract.

axioms (1)

domain assumption Inter-sentence attention from the embedding model captures semantic document structure
Invoked to justify construction of the binary chunking tree without external supervision.

invented entities (1)

binary chunking tree no independent evidence
purpose: Organize sentence chunks into progressively larger semantically coherent units for multi-granularity retrieval
New organizational structure introduced by the framework

pith-pipeline@v0.9.1-grok · 5796 in / 1284 out tokens · 46302 ms · 2026-06-27T00:47:30.907412+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

38 extracted references · 16 canonical work pages

[1]

Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks , url =

Lewis, Patrick and Perez, Ethan and Piktus, Aleksandra and Petroni, Fabio and Karpukhin, Vladimir and Goyal, Naman and K\". Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks , url =. Advances in Neural Information Processing Systems , editor =
[2]

Nature Machine Intelligence , volume=

Factuality challenges in the era of large language models and opportunities for fact-checking , author=. Nature Machine Intelligence , volume=. 2024 , publisher=

2024
[3]

Long-Context

Bowen Jin and Jinsung Yoon and Jiawei Han and Sercan O Arik , booktitle=. Long-Context. 2025 , url=

2025
[4]

URL https://aclanthology.org/2024.tacl-1.9/

Liu, Nelson F. and Lin, Kevin and Hewitt, John and Paranjape, Ashwin and Bevilacqua, Michele and Petroni, Fabio and Liang, Percy. Lost in the Middle: How Language Models Use Long Contexts. Transactions of the Association for Computational Linguistics. 2024. doi:10.1162/tacl_a_00638

work page doi:10.1162/tacl_a_00638 2024
[5]

SAKI - RAG : Mitigating Context Fragmentation in Long-Document RAG via Sentence-level Attention Knowledge Integration

Tao, Wenyu and Xing, Xiaofen and Li, Zeliang and Xu, Xiangmin. SAKI - RAG : Mitigating Context Fragmentation in Long-Document RAG via Sentence-level Attention Knowledge Integration. Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing. 2025. doi:10.18653/v1/2025.emnlp-main.63

work page doi:10.18653/v1/2025.emnlp-main.63 2025
[6]

2025 , eprint=

Late Chunking: Contextual Chunk Embeddings Using Long-Context Embedding Models , author=. 2025 , eprint=

2025
[7]

2025 , eprint=

Meta-Chunking: Learning Text Segmentation and Semantic Completion via Logical Perception , author=. 2025 , eprint=

2025
[8]

Dense X retrieval: What retrieval granularity should we use?, in: Al-Onaizan, Y ., Bansal, M., Chen, Y .N

Chen, Tong and Wang, Hongwei and Chen, Sihao and Yu, Wenhao and Ma, Kaixin and Zhao, Xinran and Zhang, Hongming and Yu, Dong. Dense X Retrieval: What Retrieval Granularity Should We Use?. Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing. 2024. doi:10.18653/v1/2024.emnlp-main.845

work page doi:10.18653/v1/2024.emnlp-main.845 2024
[9]

2024 , url=

Parth Sarthi and Salman Abdullah and Aditi Tuli and Shubh Khanna and Anna Goldie and Christopher D Manning , booktitle=. 2024 , url=

2024
[10]

2025 , eprint=

From Local to Global: A Graph RAG Approach to Query-Focused Summarization , author=. 2025 , eprint=

2025
[11]

L ight RAG : Simple and Fast Retrieval-Augmented Generation

Guo, Zirui and Xia, Lianghao and Yu, Yanhua and Ao, Tu and Huang, Chao. L ight RAG : Simple and Fast Retrieval-Augmented Generation. Findings of the Association for Computational Linguistics: EMNLP 2025. 2025. doi:10.18653/v1/2025.findings-emnlp.568

work page doi:10.18653/v1/2025.findings-emnlp.568 2025
[12]

2024 , eprint=

SentenceVAE: Enable Next-sentence Prediction for Large Language Models with Faster Speed, Higher Accuracy and Longer Context , author=. 2024 , eprint=

2024
[13]

LangChain: A framework for developing applications powered by language models , author=
[14]

P rop RAG : Guiding Retrieval with Beam Search over Proposition Paths

Wang, Jingjin and Han, Jiawei. P rop RAG : Guiding Retrieval with Beam Search over Proposition Paths. Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing. 2025. doi:10.18653/v1/2025.emnlp-main.317

work page doi:10.18653/v1/2025.emnlp-main.317 2025
[15]

Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP) , month = nov, year =

Karpukhin, Vladimir and Oguz, Barlas and Min, Sewon and Lewis, Patrick and Wu, Ledell and Edunov, Sergey and Chen, Danqi and Yih, Wen-tau. Dense Passage Retrieval for Open-Domain Question Answering. Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP). 2020. doi:10.18653/v1/2020.emnlp-main.550

work page doi:10.18653/v1/2020.emnlp-main.550 2020
[16]

Sentence- BERT : Sentence Embeddings using S iamese BERT -Networks

Reimers, Nils and Gurevych, Iryna. Sentence- BERT : Sentence Embeddings using S iamese BERT -Networks. Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP). 2019. doi:10.18653/v1/D19-1410

work page doi:10.18653/v1/d19-1410 2019
[17]

2024 , eprint=

Text Embeddings by Weakly-Supervised Contrastive Pre-training , author=. 2024 , eprint=

2024
[18]

2023 , eprint=

Towards General Text Embeddings with Multi-stage Contrastive Learning , author=. 2023 , eprint=

2023
[19]

Proceedings of the 2021

Gao, Tianyu and Yao, Xingcheng and Chen, Danqi. S im CSE : Simple Contrastive Learning of Sentence Embeddings. Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing. 2021. doi:10.18653/v1/2021.emnlp-main.552

work page doi:10.18653/v1/2021.emnlp-main.552 2021
[20]

End-to-End Beam Retrieval for Multi-Hop Question Answering

Zhang, Jiahao and Zhang, Haiyang and Zhang, Dongmei and Yong, Liu and Huang, Shen. End-to-End Beam Retrieval for Multi-Hop Question Answering. Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers). 2024. doi:10.18653/v1/2024.naacl-long.96

work page doi:10.18653/v1/2024.naacl-long.96 2024
[21]

CoRR , volume =

Tri Nguyen and Mir Rosenberg and Xia Song and Jianfeng Gao and Saurabh Tiwary and Rangan Majumder and Li Deng , title =. CoRR , volume =. 2016 , url =

2016
[22]

MoC: Mixtures of text chunking learners for retrieval-augmented generation system, in: Che, W., Nabende, J., Shutova, E., Pilehvar, M.T

Zhao, Jihao and Ji, Zhiyuan and Fan, Zhaoxin and Wang, Hanyu and Niu, Simin and Tang, Bo and Xiong, Feiyu and Li, Zhiyu. M o C : Mixtures of Text Chunking Learners for Retrieval-Augmented Generation System. Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 2025. doi:10.18653/v1/2025.acl-long.258

work page doi:10.18653/v1/2025.acl-long.258 2025
[23]

H op RAG : Multi-Hop Reasoning for Logic-Aware Retrieval-Augmented Generation

Liu, Hao and Wang, Zhengren and Chen, Xi and Li, Zhiyu and Xiong, Feiyu and Yu, Qinhan and Zhang, Wentao. H op RAG : Multi-Hop Reasoning for Logic-Aware Retrieval-Augmented Generation. Findings of the Association for Computational Linguistics: ACL 2025. 2025. doi:10.18653/v1/2025.findings-acl.97

work page doi:10.18653/v1/2025.findings-acl.97 2025
[24]

Specter:

Cohan, Arman and Feldman, Sergey and Beltagy, Iz and Downey, Doug and Weld, Daniel. SPECTER : Document-level Representation Learning using Citation-informed Transformers. Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. 2020. doi:10.18653/v1/2020.acl-main.207

work page doi:10.18653/v1/2020.acl-main.207 2020
[25]

2024 , eprint=

LegalBench-RAG: A Benchmark for Retrieval-Augmented Generation in the Legal Domain , author=. 2024 , eprint=

2024
[26]

RAGE val: Scenario Specific RAG Evaluation Dataset Generation Framework

Zhu, Kunlun and Luo, Yifan and Xu, Dingling and Yan, Yukun and Liu, Zhenghao and Yu, Shi and Wang, Ruobing and Wang, Shuo and Li, Yishan and Zhang, Nan and Han, Xu and Liu, Zhiyuan and Sun, Maosong. RAGE val: Scenario Specific RAG Evaluation Dataset Generation Framework. Proceedings of the 63rd Annual Meeting of the Association for Computational Linguisti...

work page doi:10.18653/v1/2025.acl-long.418 2025
[27]

PageIndex Blog , year =

Mingtian Zhang and Yu Tang and PageIndex Team , title =. PageIndex Blog , year =
[28]

2025 , eprint=

REFRAG: Rethinking RAG based Decoding , author=. 2025 , eprint=

2025
[29]

Analyzing Multi-Head Self-Attention: Specialized Heads Do the Heavy Lifting, the Rest Can Be Pruned

Voita, Elena and Talbot, David and Moiseev, Fedor and Sennrich, Rico and Titov, Ivan. Analyzing Multi-Head Self-Attention: Specialized Heads Do the Heavy Lifting, the Rest Can Be Pruned. Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. 2019. doi:10.18653/v1/P19-1580

work page doi:10.18653/v1/p19-1580 2019
[30]

2025 , eprint=

CLaRa: Bridging Retrieval and Generation with Continuous Latent Reasoning , author=. 2025 , eprint=

2025
[31]

R eflective RAG : Rethinking Adaptivity in Retrieval-Augmented Generation

Verma, Akshay and Gupta, Swapnil and Pillai, Siddharth and Sircar, Prateek and Gupta, Deepak. R eflective RAG : Rethinking Adaptivity in Retrieval-Augmented Generation. Proceedings of the 19th Conference of the E uropean Chapter of the A ssociation for C omputational L inguistics (Volume 5: Industry Track). 2026. doi:10.18653/v1/2026.eacl-industry.27

work page doi:10.18653/v1/2026.eacl-industry.27 2026
[32]

H otpot QA : A Dataset for Diverse, Explainable Multi-hop Question Answering

Yang, Zhilin and Qi, Peng and Zhang, Saizheng and Bengio, Yoshua and Cohen, William and Salakhutdinov, Ruslan and Manning, Christopher D. H otpot QA : A Dataset for Diverse, Explainable Multi-hop Question Answering. Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing. 2018. doi:10.18653/v1/D18-1259

work page doi:10.18653/v1/d18-1259 2018
[33]

Semantic Parsing on F reebase from Question-Answer Pairs

Berant, Jonathan and Chou, Andrew and Frostig, Roy and Liang, Percy. Semantic Parsing on F reebase from Question-Answer Pairs. Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing. 2013

2013
[34]

2025 , eprint=

Qwen3 Technical Report , author=. 2025 , eprint=

2025
[35]

2025 , eprint=

Qwen3 Embedding: Advancing Text Embedding and Reranking Through Foundation Models , author=. 2025 , eprint=

2025
[36]

ROUGE : A Package for Automatic Evaluation of Summaries

Lin, Chin-Yew. ROUGE : A Package for Automatic Evaluation of Summaries. Text Summarization Branches Out. 2004

2004
[37]

METEOR : An Automatic Metric for MT Evaluation with Improved Correlation with Human Judgments

Banerjee, Satanjeev and Lavie, Alon. METEOR : An Automatic Metric for MT Evaluation with Improved Correlation with Human Judgments. Proceedings of the ACL Workshop on Intrinsic and Extrinsic Evaluation Measures for Machine Translation and/or Summarization. 2005

2005
[38]

International Conference on Learning Representations , year=

BERTScore: Evaluating Text Generation with BERT , author=. International Conference on Learning Representations , year=

[1] [1]

Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks , url =

Lewis, Patrick and Perez, Ethan and Piktus, Aleksandra and Petroni, Fabio and Karpukhin, Vladimir and Goyal, Naman and K\". Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks , url =. Advances in Neural Information Processing Systems , editor =

[2] [2]

Nature Machine Intelligence , volume=

Factuality challenges in the era of large language models and opportunities for fact-checking , author=. Nature Machine Intelligence , volume=. 2024 , publisher=

2024

[3] [3]

Long-Context

Bowen Jin and Jinsung Yoon and Jiawei Han and Sercan O Arik , booktitle=. Long-Context. 2025 , url=

2025

[4] [4]

URL https://aclanthology.org/2024.tacl-1.9/

Liu, Nelson F. and Lin, Kevin and Hewitt, John and Paranjape, Ashwin and Bevilacqua, Michele and Petroni, Fabio and Liang, Percy. Lost in the Middle: How Language Models Use Long Contexts. Transactions of the Association for Computational Linguistics. 2024. doi:10.1162/tacl_a_00638

work page doi:10.1162/tacl_a_00638 2024

[5] [5]

SAKI - RAG : Mitigating Context Fragmentation in Long-Document RAG via Sentence-level Attention Knowledge Integration

Tao, Wenyu and Xing, Xiaofen and Li, Zeliang and Xu, Xiangmin. SAKI - RAG : Mitigating Context Fragmentation in Long-Document RAG via Sentence-level Attention Knowledge Integration. Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing. 2025. doi:10.18653/v1/2025.emnlp-main.63

work page doi:10.18653/v1/2025.emnlp-main.63 2025

[6] [6]

2025 , eprint=

Late Chunking: Contextual Chunk Embeddings Using Long-Context Embedding Models , author=. 2025 , eprint=

2025

[7] [7]

2025 , eprint=

Meta-Chunking: Learning Text Segmentation and Semantic Completion via Logical Perception , author=. 2025 , eprint=

2025

[8] [8]

Dense X retrieval: What retrieval granularity should we use?, in: Al-Onaizan, Y ., Bansal, M., Chen, Y .N

Chen, Tong and Wang, Hongwei and Chen, Sihao and Yu, Wenhao and Ma, Kaixin and Zhao, Xinran and Zhang, Hongming and Yu, Dong. Dense X Retrieval: What Retrieval Granularity Should We Use?. Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing. 2024. doi:10.18653/v1/2024.emnlp-main.845

work page doi:10.18653/v1/2024.emnlp-main.845 2024

[9] [9]

2024 , url=

Parth Sarthi and Salman Abdullah and Aditi Tuli and Shubh Khanna and Anna Goldie and Christopher D Manning , booktitle=. 2024 , url=

2024

[10] [10]

2025 , eprint=

From Local to Global: A Graph RAG Approach to Query-Focused Summarization , author=. 2025 , eprint=

2025

[11] [11]

L ight RAG : Simple and Fast Retrieval-Augmented Generation

Guo, Zirui and Xia, Lianghao and Yu, Yanhua and Ao, Tu and Huang, Chao. L ight RAG : Simple and Fast Retrieval-Augmented Generation. Findings of the Association for Computational Linguistics: EMNLP 2025. 2025. doi:10.18653/v1/2025.findings-emnlp.568

work page doi:10.18653/v1/2025.findings-emnlp.568 2025

[12] [12]

2024 , eprint=

SentenceVAE: Enable Next-sentence Prediction for Large Language Models with Faster Speed, Higher Accuracy and Longer Context , author=. 2024 , eprint=

2024

[13] [13]

LangChain: A framework for developing applications powered by language models , author=

[14] [14]

P rop RAG : Guiding Retrieval with Beam Search over Proposition Paths

Wang, Jingjin and Han, Jiawei. P rop RAG : Guiding Retrieval with Beam Search over Proposition Paths. Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing. 2025. doi:10.18653/v1/2025.emnlp-main.317

work page doi:10.18653/v1/2025.emnlp-main.317 2025

[15] [15]

Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP) , month = nov, year =

Karpukhin, Vladimir and Oguz, Barlas and Min, Sewon and Lewis, Patrick and Wu, Ledell and Edunov, Sergey and Chen, Danqi and Yih, Wen-tau. Dense Passage Retrieval for Open-Domain Question Answering. Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP). 2020. doi:10.18653/v1/2020.emnlp-main.550

work page doi:10.18653/v1/2020.emnlp-main.550 2020

[16] [16]

Sentence- BERT : Sentence Embeddings using S iamese BERT -Networks

Reimers, Nils and Gurevych, Iryna. Sentence- BERT : Sentence Embeddings using S iamese BERT -Networks. Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP). 2019. doi:10.18653/v1/D19-1410

work page doi:10.18653/v1/d19-1410 2019

[17] [17]

2024 , eprint=

Text Embeddings by Weakly-Supervised Contrastive Pre-training , author=. 2024 , eprint=

2024

[18] [18]

2023 , eprint=

Towards General Text Embeddings with Multi-stage Contrastive Learning , author=. 2023 , eprint=

2023

[19] [19]

Proceedings of the 2021

Gao, Tianyu and Yao, Xingcheng and Chen, Danqi. S im CSE : Simple Contrastive Learning of Sentence Embeddings. Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing. 2021. doi:10.18653/v1/2021.emnlp-main.552

work page doi:10.18653/v1/2021.emnlp-main.552 2021

[20] [20]

End-to-End Beam Retrieval for Multi-Hop Question Answering

Zhang, Jiahao and Zhang, Haiyang and Zhang, Dongmei and Yong, Liu and Huang, Shen. End-to-End Beam Retrieval for Multi-Hop Question Answering. Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers). 2024. doi:10.18653/v1/2024.naacl-long.96

work page doi:10.18653/v1/2024.naacl-long.96 2024

[21] [21]

CoRR , volume =

Tri Nguyen and Mir Rosenberg and Xia Song and Jianfeng Gao and Saurabh Tiwary and Rangan Majumder and Li Deng , title =. CoRR , volume =. 2016 , url =

2016

[22] [22]

MoC: Mixtures of text chunking learners for retrieval-augmented generation system, in: Che, W., Nabende, J., Shutova, E., Pilehvar, M.T

Zhao, Jihao and Ji, Zhiyuan and Fan, Zhaoxin and Wang, Hanyu and Niu, Simin and Tang, Bo and Xiong, Feiyu and Li, Zhiyu. M o C : Mixtures of Text Chunking Learners for Retrieval-Augmented Generation System. Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 2025. doi:10.18653/v1/2025.acl-long.258

work page doi:10.18653/v1/2025.acl-long.258 2025

[23] [23]

H op RAG : Multi-Hop Reasoning for Logic-Aware Retrieval-Augmented Generation

Liu, Hao and Wang, Zhengren and Chen, Xi and Li, Zhiyu and Xiong, Feiyu and Yu, Qinhan and Zhang, Wentao. H op RAG : Multi-Hop Reasoning for Logic-Aware Retrieval-Augmented Generation. Findings of the Association for Computational Linguistics: ACL 2025. 2025. doi:10.18653/v1/2025.findings-acl.97

work page doi:10.18653/v1/2025.findings-acl.97 2025

[24] [24]

Specter:

Cohan, Arman and Feldman, Sergey and Beltagy, Iz and Downey, Doug and Weld, Daniel. SPECTER : Document-level Representation Learning using Citation-informed Transformers. Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. 2020. doi:10.18653/v1/2020.acl-main.207

work page doi:10.18653/v1/2020.acl-main.207 2020

[25] [25]

2024 , eprint=

LegalBench-RAG: A Benchmark for Retrieval-Augmented Generation in the Legal Domain , author=. 2024 , eprint=

2024

[26] [26]

RAGE val: Scenario Specific RAG Evaluation Dataset Generation Framework

Zhu, Kunlun and Luo, Yifan and Xu, Dingling and Yan, Yukun and Liu, Zhenghao and Yu, Shi and Wang, Ruobing and Wang, Shuo and Li, Yishan and Zhang, Nan and Han, Xu and Liu, Zhiyuan and Sun, Maosong. RAGE val: Scenario Specific RAG Evaluation Dataset Generation Framework. Proceedings of the 63rd Annual Meeting of the Association for Computational Linguisti...

work page doi:10.18653/v1/2025.acl-long.418 2025

[27] [27]

PageIndex Blog , year =

Mingtian Zhang and Yu Tang and PageIndex Team , title =. PageIndex Blog , year =

[28] [28]

2025 , eprint=

REFRAG: Rethinking RAG based Decoding , author=. 2025 , eprint=

2025

[29] [29]

Analyzing Multi-Head Self-Attention: Specialized Heads Do the Heavy Lifting, the Rest Can Be Pruned

Voita, Elena and Talbot, David and Moiseev, Fedor and Sennrich, Rico and Titov, Ivan. Analyzing Multi-Head Self-Attention: Specialized Heads Do the Heavy Lifting, the Rest Can Be Pruned. Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. 2019. doi:10.18653/v1/P19-1580

work page doi:10.18653/v1/p19-1580 2019

[30] [30]

2025 , eprint=

CLaRa: Bridging Retrieval and Generation with Continuous Latent Reasoning , author=. 2025 , eprint=

2025

[31] [31]

R eflective RAG : Rethinking Adaptivity in Retrieval-Augmented Generation

Verma, Akshay and Gupta, Swapnil and Pillai, Siddharth and Sircar, Prateek and Gupta, Deepak. R eflective RAG : Rethinking Adaptivity in Retrieval-Augmented Generation. Proceedings of the 19th Conference of the E uropean Chapter of the A ssociation for C omputational L inguistics (Volume 5: Industry Track). 2026. doi:10.18653/v1/2026.eacl-industry.27

work page doi:10.18653/v1/2026.eacl-industry.27 2026

[32] [32]

H otpot QA : A Dataset for Diverse, Explainable Multi-hop Question Answering

Yang, Zhilin and Qi, Peng and Zhang, Saizheng and Bengio, Yoshua and Cohen, William and Salakhutdinov, Ruslan and Manning, Christopher D. H otpot QA : A Dataset for Diverse, Explainable Multi-hop Question Answering. Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing. 2018. doi:10.18653/v1/D18-1259

work page doi:10.18653/v1/d18-1259 2018

[33] [33]

Semantic Parsing on F reebase from Question-Answer Pairs

Berant, Jonathan and Chou, Andrew and Frostig, Roy and Liang, Percy. Semantic Parsing on F reebase from Question-Answer Pairs. Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing. 2013

2013

[34] [34]

2025 , eprint=

Qwen3 Technical Report , author=. 2025 , eprint=

2025

[35] [35]

2025 , eprint=

Qwen3 Embedding: Advancing Text Embedding and Reranking Through Foundation Models , author=. 2025 , eprint=

2025

[36] [36]

ROUGE : A Package for Automatic Evaluation of Summaries

Lin, Chin-Yew. ROUGE : A Package for Automatic Evaluation of Summaries. Text Summarization Branches Out. 2004

2004

[37] [37]

METEOR : An Automatic Metric for MT Evaluation with Improved Correlation with Human Judgments

Banerjee, Satanjeev and Lavie, Alon. METEOR : An Automatic Metric for MT Evaluation with Improved Correlation with Human Judgments. Proceedings of the ACL Workshop on Intrinsic and Extrinsic Evaluation Measures for Machine Translation and/or Summarization. 2005

2005

[38] [38]

International Conference on Learning Representations , year=

BERTScore: Evaluating Text Generation with BERT , author=. International Conference on Learning Representations , year=