Recognition: 2 theorem links
RAPTOR: Recursive Abstractive Processing for Tree-Organized Retrieval
Pith reviewed 2026-05-15 13:02 UTC · model grok-4.3
The pith
Recursive clustering and summarization builds a tree that improves retrieval-augmented reasoning over long documents.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
By recursively embedding, clustering, and summarizing chunks of text, RAPTOR constructs a tree with differing levels of summarization from the bottom up. At inference time the model retrieves from this tree, integrating information across lengthy documents at different levels of abstraction. Controlled experiments show that retrieval with recursive summaries offers significant improvements over traditional retrieval-augmented language models on several tasks, achieving state-of-the-art results on question-answering benchmarks that involve complex, multi-step reasoning.
What carries the argument
The RAPTOR tree, constructed bottom-up by recursive embedding, clustering, and abstractive summarization of text chunks.
If this is right
- Retrieval integrates context across long documents at varying levels of abstraction.
- Performance improves on question-answering tasks that require complex multi-step reasoning.
- State-of-the-art results are achieved on benchmarks such as QuALITY when the tree is used with GPT-4.
- The method supports better incorporation of long-tail knowledge compared with flat retrieval.
Where Pith is reading between the lines
- The tree structure could reduce reliance on ever-larger context windows by supplying relevant summaries on demand.
- Similar hierarchical organization might extend to tasks such as document summarization or multi-hop knowledge extraction.
- Different embedding or clustering choices could be substituted to test robustness of the performance gains.
Load-bearing premise
The recursive clustering and summarization process effectively captures and preserves all relevant information from the original document without significant loss or distortion.
What would settle it
If retrieval from the RAPTOR tree produced lower accuracy than standard chunk retrieval on the QuALITY benchmark, the performance benefit would be falsified.
read the original abstract
Retrieval-augmented language models can better adapt to changes in world state and incorporate long-tail knowledge. However, most existing methods retrieve only short contiguous chunks from a retrieval corpus, limiting holistic understanding of the overall document context. We introduce the novel approach of recursively embedding, clustering, and summarizing chunks of text, constructing a tree with differing levels of summarization from the bottom up. At inference time, our RAPTOR model retrieves from this tree, integrating information across lengthy documents at different levels of abstraction. Controlled experiments show that retrieval with recursive summaries offers significant improvements over traditional retrieval-augmented LMs on several tasks. On question-answering tasks that involve complex, multi-step reasoning, we show state-of-the-art results; for example, by coupling RAPTOR retrieval with the use of GPT-4, we can improve the best performance on the QuALITY benchmark by 20% in absolute accuracy.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript introduces RAPTOR, a retrieval-augmented generation method that recursively embeds, clusters, and abstractive-summarizes document chunks to construct a multi-level tree. At inference, the model retrieves from this tree to integrate information across different levels of abstraction, addressing limitations of flat chunk retrieval in standard RAG. Controlled experiments on QA benchmarks demonstrate improvements over baselines, with a reported 20% absolute accuracy gain on QuALITY when paired with GPT-4.
Significance. If the results hold under full scrutiny, RAPTOR offers a practical advance in handling long-document context for complex reasoning tasks by enabling hierarchical abstraction retrieval. The approach builds on established embedding and clustering techniques with a novel bottom-up tree construction, and the empirical SOTA claims on multiple QA tasks could influence future RAG designs if the information-preservation properties of the summaries are confirmed.
major comments (3)
- [§4.2] §4.2 (Tree Construction): The recursive clustering and summarization steps lack explicit controls or metrics for information fidelity (e.g., no ROUGE or entailment checks between summaries and source chunks), which is load-bearing for the central claim that the tree preserves usable context without significant loss.
- [§5.3] §5.3 (QuALITY Experiments): The 20% absolute accuracy improvement is reported without variance, statistical significance tests, or ablation isolating the contribution of multi-level retrieval versus GPT-4 prompting alone; this undermines the cross-baseline comparison.
- [§3] §3 (Retrieval Algorithm): The inference-time retrieval procedure from the tree is described at a high level but omits precise scoring or selection rules across levels, making it impossible to reproduce the exact integration of summaries and chunks.
minor comments (2)
- [Figure 1] Figure 1: The tree diagram is helpful but the caption does not specify the embedding model or clustering algorithm used in the illustrated example.
- [Related Work] Related Work section: The comparison to prior hierarchical retrieval methods (e.g., those using sentence embeddings or graph-based structures) could be expanded with quantitative differences.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback. The comments highlight important areas for improving rigor and reproducibility. We address each major comment below and will incorporate the suggested changes in the revised manuscript.
read point-by-point responses
-
Referee: [§4.2] §4.2 (Tree Construction): The recursive clustering and summarization steps lack explicit controls or metrics for information fidelity (e.g., no ROUGE or entailment checks between summaries and source chunks), which is load-bearing for the central claim that the tree preserves usable context without significant loss.
Authors: We agree that explicit fidelity metrics would strengthen the central claim. While downstream performance gains provide supporting evidence, we will add quantitative controls in the revised Section 4.2, including average ROUGE-L scores and NLI entailment rates between each summary and its source chunks across tree levels. revision: yes
-
Referee: [§5.3] §5.3 (QuALITY Experiments): The 20% absolute accuracy improvement is reported without variance, statistical significance tests, or ablation isolating the contribution of multi-level retrieval versus GPT-4 prompting alone; this undermines the cross-baseline comparison.
Authors: We acknowledge the importance of statistical reporting. In the revision we will include standard deviations over multiple runs, perform significance tests, and add an ablation in Section 5.3 that directly compares hierarchical RAPTOR retrieval against flat retrieval using identical GPT-4 prompting to isolate the multi-level contribution. revision: yes
-
Referee: [§3] §3 (Retrieval Algorithm): The inference-time retrieval procedure from the tree is described at a high level but omits precise scoring or selection rules across levels, making it impossible to reproduce the exact integration of summaries and chunks.
Authors: We will expand Section 3 with a detailed description of the retrieval procedure, including the exact cosine-similarity scoring function, level-specific top-k thresholds, and the prompt-integration rules for combining summaries and chunks. Pseudocode will be added to ensure full reproducibility. revision: yes
Circularity Check
No significant circularity in derivation chain
full rationale
The paper presents RAPTOR as an explicit algorithmic construction: recursive bottom-up embedding, clustering, and abstractive summarization to form a multi-level tree, followed by top-down retrieval at inference. Performance claims (e.g., +20% absolute on QuALITY with GPT-4) rest on controlled empirical comparisons against flat-chunk RAG baselines, not on any mathematical derivation that reduces to its own fitted inputs or self-citations. No self-definitional loops, renamed predictions, or load-bearing self-citations appear in the method or results sections. The approach is self-contained against external benchmarks.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Abstractive summaries at different levels retain sufficient information for multi-step reasoning
Forward citations
Cited by 21 Pith papers
-
ShadowMerge: A Novel Poisoning Attack on Graph-Based Agent Memory via Relation-Channel Conflicts
ShadowMerge poisons graph-based agent memory via relation-channel conflicts using an AIR pipeline, achieving 93.8% average attack success rate on Mem0 and three real-world datasets while bypassing existing defenses.
-
ShadowMerge: A Novel Poisoning Attack on Graph-Based Agent Memory via Relation-Channel Conflicts
ShadowMerge poisons graph-based agent memory by creating relation-channel conflicts that get extracted and retrieved, achieving 93.8% attack success rate on Mem0 and datasets like PubMedQA while evading prior defenses.
-
Goal-Oriented Reasoning for RAG-based Memory in Conversational Agentic LLM Systems
Goal-Mem improves RAG memory retrieval in agentic LLMs by explicit goal decomposition and backward chaining via Natural Language Logic, outperforming nine baselines on multi-hop and implicit inference tasks.
-
OCR-Memory: Optical Context Retrieval for Long-Horizon Agent Memory
OCR-Memory encodes agent trajectories as images with visual anchors and retrieves verbatim text via locate-and-transcribe, yielding gains on long-horizon benchmarks under strict context limits.
-
Do We Still Need GraphRAG? Benchmarking RAG and GraphRAG for Agentic Search Systems
Agentic search narrows the gap between dense RAG and GraphRAG but does not remove GraphRAG's advantage on complex multi-hop reasoning.
-
CLAG: Adaptive Memory Organization via Agent-Driven Clustering for Small Language Model Agents
CLAG organizes agent memory into clusters via an SLM router and uses cluster profiles for two-stage retrieval, yielding better answer quality on QA benchmarks than prior memory systems.
-
ASTRA-QA: A Benchmark for Abstract Question Answering over Documents
ASTRA-QA is a benchmark for abstract document question answering that uses explicit topic sets, unsupported content annotations, and evidence alignments to enable direct scoring of coverage and hallucination.
-
An Agent-Oriented Pluggable Experience-RAG Skill for Experience-Driven Retrieval Strategy Orchestration
Experience-RAG Skill uses experience memory to dynamically select retrieval strategies for agents, achieving 0.8924 nDCG@10 on BeIR/nq, hotpotqa, and scifact while outperforming fixed single-retriever baselines.
-
FT-RAG: A Fine-grained Retrieval-Augmented Generation Framework for Complex Table Reasoning
FT-RAG introduces a fine-grained graph-based retrieval framework for tables plus a new 9870-pair benchmark, reporting 23.5% and 59.2% gains in table- and cell-level hit rates and 62.2% higher exact-value recall over b...
-
MemORAI: Memory Organization and Retrieval via Adaptive Graph Intelligence for LLM Conversational Agents
MemORAI combines selective filtering, provenance tracking in multi-relational graphs, and dynamic weighted PageRank retrieval to achieve state-of-the-art memory retrieval and personalized responses in LLM agents on LO...
-
MindTrellis: Co-Creating Knowledge Structures with AI through Interactive Visual Exploration
MindTrellis enables users and AI to co-create evolving knowledge graphs, outperforming retrieval-only tools in expert-rated content coverage, structural quality, and reduced cognitive load during a study of 12 partici...
-
Memanto: Typed Semantic Memory with Information-Theoretic Retrieval for Long-Horizon Agents
Memanto delivers 89.8% and 87.1% accuracy on LongMemEval and LoCoMo benchmarks using typed semantic memory and information-theoretic retrieval, outperforming hybrid graph and vector systems with a single query and zer...
-
Memory in the LLM Era: Modular Architectures and Strategies in a Unified Framework
A unified framework for LLM agent memory is benchmarked, with a new hybrid method outperforming state-of-the-art on standard tasks.
-
GraphRAG-Router: Learning Cost-Efficient Routing over GraphRAGs and LLMs with Reinforcement Learning
GraphRAG-Router uses two-stage reinforcement learning with a cost-aware curriculum reward to route queries across heterogeneous GraphRAGs and LLMs, cutting large-LLM overuse by nearly 30 percent while matching baselin...
-
From Local to Global: A Graph RAG Approach to Query-Focused Summarization
GraphRAG improves comprehensiveness and diversity of answers to global questions over million-token document sets by constructing entity graphs and hierarchical community summaries before combining partial responses.
-
How Does Chunking Affect Retrieval-Augmented Code Completion? A Controlled Empirical Study
Function-based chunking underperforms other strategies in RAG code completion by 3.57-5.64 points, with context length as the dominant factor.
-
GRAVITY: Architecture-Agnostic Structured Anchoring for Long-Horizon Conversational Memory
GRAVITY adds structured relational, temporal, and thematic memory anchors to conversational LLMs at generation time, delivering 7.5-10.1% average gains in LLM-judge accuracy across five host systems on LongMemEval and LoCoMo.
-
Stateful Evidence-Driven Retrieval-Augmented Generation with Iterative Reasoning
A stateful iterative RAG system converts retrieved documents into scored reasoning units, maintains supportive and non-supportive evidence, and performs deficiency-driven query refinement to achieve more robust QA per...
-
A Survey on Hallucination in Large Language Models: Principles, Taxonomy, Challenges, and Open Questions
The paper surveys hallucination in LLMs with an innovative taxonomy, factors, detection methods, benchmarks, mitigation strategies, and open research directions.
-
An Agent-Oriented Pluggable Experience-RAG Skill for Experience-Driven Retrieval Strategy Orchestration
Experience-RAG Skill is a reusable agent skill that selects retrieval strategies via experience memory, achieving 0.8924 nDCG@10 on BeIR/nq, hotpotqa, and scifact while outperforming fixed retriever baselines.
-
Reducing Redundancy in Retrieval-Augmented Generation through Chunk Filtering
Entity-based chunk filtering reduces RAG vector index size by 25-36% with retrieval quality near baseline levels.
Reference graph
Works this paper leans on
-
[1]
On the S urprising B ehavior of D istance M etrics in H igh D imensional S pace
Charu C Aggarwal, Alexander Hinneburg, and Daniel A Keim. On the S urprising B ehavior of D istance M etrics in H igh D imensional S pace. In Database Theory—ICDT 2001: 8th International Conference London, UK, January 4--6, 2001 Proceedings 8, pp.\ 420--434. Springer, 2001. URL https://link.springer.com/chapter/10.1007/3-540-44503-x_27
-
[7]
I mproving language models by retrieving from trillions of tokens
Sebastian Borgeaud, Arthur Mensch, Jordan Hoffmann, Trevor Cai, Eliza Rutherford, Katie Millican, George Bm Van Den Driessche, Jean-Baptiste Lespiau, Bogdan Damoc, Aidan Clark, et al. I mproving language models by retrieving from trillions of tokens. In International conference on machine learning, pp.\ 2206--2240. PMLR, 2022. URL https://arxiv.org/abs/2112.04426
-
[8]
L anguage M odels are F ew- S hot L earners
Tom Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared D Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, Sandhini Agarwal, Ariel Herbert-Voss, Gretchen Krueger, Tom Henighan, Rewon Child, Aditya Ramesh, Daniel Ziegler, Jeffrey Wu, Clemens Winter, Chris Hesse, Mark Chen, Eric Sigler, Mateusz Litwin, Scott Gr...
work page 1901
-
[12]
PaLM: Scaling Language Modeling with Pathways
Aakanksha Chowdhery, Sharan Narang, Jacob Devlin, Maarten Bosma, Gaurav Mishra, Adam Roberts, Paul Barham, Hyung Won Chung, Charles Sutton, Sebastian Gehrmann, et al. Pa LM : S caling L anguage M odeling with P athways. arXiv preprint arXiv:2204.02311, 2022. URL https://arxiv.org/abs/2204.02311
work page internal anchor Pith review Pith/arXiv arXiv 2022
-
[13]
Contextualizing citations for scientific summarization using word embeddings and domain knowledge
Arman Cohan and Nazli Goharian. Contextualizing citations for scientific summarization using word embeddings and domain knowledge. In Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval, pp.\ 1133--1136, 2017. URL https://dl.acm.org/doi/abs/10.1145/3077136.3080740
-
[15]
FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness
Tri Dao, Dan Fu, Stefano Ermon, Atri Rudra, and Christopher R \'e . F lash A ttention: F ast and memory-efficient exact attention with IO - A wareness. Advances in Neural Information Processing Systems, 35: 0 16344--16359, 2022. URL https://arxiv.org/abs/2205.14135
work page internal anchor Pith review Pith/arXiv arXiv 2022
-
[17]
Co LISA : I nner I nteraction via C ontrastive L earning for M ulti-choice R eading C omprehension
Mengxing Dong, Bowei Zou, Yanling Li, and Yu Hong. Co LISA : I nner I nteraction via C ontrastive L earning for M ulti-choice R eading C omprehension. In Advances in Information Retrieval: 45th European Conference on Information Retrieval, ECIR 2023, Dublin, Ireland, April 2--6, 2023, Proceedings, Part I, pp.\ 264--278. Springer, 2023 a . URL https://link...
-
[21]
REALM: Retrieval-Augmented Language Model Pre-Training
Kelvin Guu, Kenton Lee, Zora Tung, Panupong Pasupat, and Mingwei Chang. R etrieval A ugmented L anguage M odel P re- T raining. In International conference on machine learning, pp.\ 3929--3938. PMLR, 2020. URL https://doi.org/10.48550/arXiv.2002.08909
work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2002.08909 2020
-
[25]
Zhengbao Jiang, Frank F Xu, Jun Araki, and Graham Neubig. How can we know what language models know? Transactions of the Association for Computational Linguistics, 8: 0 423--438, 2020. URL https://arxiv.org/abs/1911.12543
-
[26]
Billion-scale similarity search with GPUs
Jeff Johnson, Matthijs Douze, and Herv \'e J \'e gou. B illion- S cale S imilarity S earch with GPUs . IEEE Transactions on Big Data, 7 0 (3): 0 535--547, 2019. URL https://arxiv.org/abs/1702.08734
work page internal anchor Pith review Pith/arXiv arXiv 2019
-
[27]
L arge L anguage M odels struggle to learn L ong- T ail K nowledge
Nikhil Kandpal, Haikang Deng, Adam Roberts, Eric Wallace, and Colin Raffel. L arge L anguage M odels struggle to learn L ong- T ail K nowledge. In International Conference on Machine Learning, pp.\ 15696--15707. PMLR, 2023. URL https://proceedings.mlr.press/v202/kandpal23a/kandpal23a.pdf
work page 2023
-
[30]
Col BERT : Efficient and effective passage search via contextualized late interaction over bert
Omar Khattab and Matei Zaharia. Col BERT : Efficient and effective passage search via contextualized late interaction over bert. In Proceedings of the 43rd International ACM SIGIR conference on research and development in Information Retrieval, pp.\ 39--48, 2020. URL https://arxiv.org/abs/2004.12832
-
[31]
The NarrativeQA Reading Comprehension Challenge
Tom \'a s Ko c isk \`y , Jonathan Schwarz, Phil Blunsom, Chris Dyer, Karl Moritz Hermann, G \'a bor Melis, and Edward Grefenstette. The N arrative QA R eading C omprehension C hallenge. Transactions of the Association for Computational Linguistics, 6: 0 317--328, 2018. URL https://arxiv.org/abs/1712.07040
work page internal anchor Pith review Pith/arXiv arXiv 2018
-
[32]
Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks
Patrick Lewis, Ethan Perez, Aleksandra Piktus, Fabio Petroni, Vladimir Karpukhin, Naman Goyal, Heinrich K \"u ttler, Mike Lewis, Wen-tau Yih, Tim Rockt \"a schel, et al. R etrieval- A ugmented G eneration for K nowledge- I ntensive NLP T asks. Advances in Neural Information Processing Systems, 33: 0 9459--9474, 2020. URL https://doi.org/10.48550/arXiv.2005.11401
work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2005.11401 2020
-
[33]
Jerry Liu. LlamaIndex , 2022. URL https://github.com/jerryjliu/llama_index
work page 2022
-
[36]
UMAP: Uniform Manifold Approximation and Projection for Dimension Reduction
Leland McInnes, John Healy, and James Melville. UMAP : U niform M anifold A pproximation and P rojection for D imension R eduction, 2018. URL https://arxiv.org/abs/1802.03426. arXiv preprint arXiv:1802.03426
work page internal anchor Pith review Pith/arXiv arXiv 2018
-
[39]
Memory-based model editing at scale
Eric Mitchell, Charles Lin, Antoine Bosselut, Christopher D Manning, and Chelsea Finn. Memory-based model editing at scale. In International Conference on Machine Learning, pp.\ 15817--15831. PMLR, 2022. URL https://proceedings.mlr.press/v162/mitchell22a/mitchell22a.pdf
work page 2022
-
[43]
OpenAI. GPT-4 T echnical R eport. ArXiv, abs/2303.08774, 2023. URL https://arxiv.org/abs/2303.08774
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[44]
Richard Yuanzhe Pang, Alicia Parrish, Nitish Joshi, Nikita Nangia, Jason Phang, Angelica Chen, Vishakh Padmakumar, Johnny Ma, Jana Thompson, He He, and Samuel Bowman. Q u ALITY : Q uestion A nswering with L ong I nput T exts, Y es! In Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human L...
work page 2022
-
[46]
Scaling Language Models: Methods, Analysis & Insights from Training Gopher
Jack W Rae, Sebastian Borgeaud, Trevor Cai, Katie Millican, Jordan Hoffmann, Francis Song, John Aslanides, Sarah Henderson, Roman Ring, Susannah Young, et al. Scaling language models: M ethods, A nalysis & I nsights from T raining G opher. arXiv preprint arXiv:2112.11446, 2021. URL https://arxiv.org/abs/2112.11446
work page internal anchor Pith review Pith/arXiv arXiv 2021
-
[50]
The P robabilistic R elevance F ramework: BM25 and B eyond
Stephen Robertson, Hugo Zaragoza, et al. The P robabilistic R elevance F ramework: BM25 and B eyond. Foundations and Trends in Information Retrieval, 3 0 (4): 0 333--389, 2009. URL https://doi.org/10.1561/1500000019
-
[51]
Stephen E Robertson, Steve Walker, Susan Jones, Micheline M Hancock-Beaulieu, Mike Gatford, et al. O kapi at TREC -3. Nist Special Publication Sp, 109: 0 109, 1995. URL https://www.microsoft.com/en-us/research/publication/okapi-at-trec-3/
work page 1995
-
[53]
Estimating the D imension of a M odel
Gideon Schwarz. Estimating the D imension of a M odel. The annals of statistics, pp.\ 461--464, 1978. URL https://projecteuclid.org/journals/annals-of-statistics/volume-6/issue-2/Estimating-the-Dimension-of-a-Model/10.1214/aos/1176344136.full
-
[54]
A S tatistical I nterpretation of T erm S pecificity and its A pplication in R etrieval
Karen Sp \"a rck Jones. A S tatistical I nterpretation of T erm S pecificity and its A pplication in R etrieval. Journal of documentation, 28 0 (1): 0 11--21, 1972. URL https://doi.org/10.1108/eb026526
-
[57]
o LM pics-- on what language model pre-training captures
Alon Talmor, Yanai Elazar, Yoav Goldberg, and Jonathan Berant. o LM pics-- on what language model pre-training captures. Transactions of the Association for Computational Linguistics, 8: 0 743--758, 2020. URL https://arxiv.org/abs/1912.13283
-
[61]
Generate rather than retrieve: Large language models are strong context generators,
Wenhao Yu, Dan Iter, Shuohang Wang, Yichong Xu, Mingxuan Ju, Soumya Sanyal, Chenguang Zhu, Michael Zeng, and Meng Jiang. G enerate r ather than r etrieve: L arge L anguage M odels are s trong c ontext g enerators, 2022. URL https://arxiv.org/abs/2209.10063
- [63]
-
[64]
Contextualizing citations for scientific summarization using word embeddings and domain knowledge , author=. Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval , pages=. 2017 , url=
work page 2017
-
[65]
Hybrid Hierarchical Retrieval for Open-Domain Question Answering
Arivazhagan, Manoj Ghuhan and Liu, Lan and Qi, Peng and Chen, Xinchi and Wang, William Yang and Huang, Zhiheng. Hybrid Hierarchical Retrieval for Open-Domain Question Answering. Findings of the Association for Computational Linguistics: ACL 2023. 2023. doi:10.18653/v1/2023.findings-acl.679
-
[66]
Dense Hierarchical Retrieval for Open-domain Question Answering
Liu, Ye and Hashimoto, Kazuma and Zhou, Yingbo and Yavuz, Semih and Xiong, Caiming and Yu, Philip. Dense Hierarchical Retrieval for Open-domain Question Answering. Findings of the Association for Computational Linguistics: EMNLP 2021. 2021. doi:10.18653/v1/2021.findings-emnlp.19
-
[67]
Zhang, Shiyue and Wan, David and Bansal, Mohit. Extractive is not Faithful: An Investigation of Broad Unfaithfulness Problems in Extractive Summarization. Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 2023. doi:10.18653/v1/2023.acl-long.120
-
[68]
arXiv preprint arXiv:2305.14772 , year=
A Controllable QA-based Framework for Decontextualization , author=. arXiv preprint arXiv:2305.14772 , year=
-
[69]
Joint Passage Ranking for Diverse Multi-Answer Retrieval
Min, Sewon and Lee, Kenton and Chang, Ming-Wei and Toutanova, Kristina and Hajishirzi, Hannaneh. Joint Passage Ranking for Diverse Multi-Answer Retrieval. Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing. 2021. doi:10.18653/v1/2021.emnlp-main.560
-
[70]
arXiv preprint arXiv:2305.14627 , year=
Enabling Large Language Models to Generate Text with Citations , author=. arXiv preprint arXiv:2305.14627 , year=
-
[71]
Do Long-Range Language Models Actually Use Long-Range Context?
Sun, Simeng and Krishna, Kalpesh and Mattarella-Micke, Andrew and Iyyer, Mohit. Do Long-Range Language Models Actually Use Long-Range Context?. Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing. 2021. doi:10.18653/v1/2021.emnlp-main.62
-
[72]
L ong T 5: E fficient Text-To-Text Transformer for Long Sequences
Guo, Mandy and Ainslie, Joshua and Uthus, David and Ontanon, Santiago and Ni, Jianmo and Sung, Yun-Hsuan and Yang, Yinfei. L ong T 5: E fficient Text-To-Text Transformer for Long Sequences. Findings of the Association for Computational Linguistics: NAACL 2022. 2022. doi:10.18653/v1/2022.findings-naacl.55
- [73]
-
[74]
Dasigi, Pradeep and Lo, Kyle and Beltagy, Iz and Cohan, Arman and Smith, Noah A. and Gardner, Matt. A D ataset of I nformation- S eeking Q uestions and A nswers A nchored in R esearch P apers. Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2021. doi:10.18653/v...
-
[75]
Dense passage retrieval for open-domain question answering
Karpukhin, Vladimir and Oguz, Barlas and Min, Sewon and Lewis, Patrick and Wu, Ledell and Edunov, Sergey and Chen, Danqi and Yih, Wen-tau. D ense P assage R etrieval for O pen- D omain Q uestion A nswering. Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP). 2020. doi:10.18653/v1/2020.emnlp-main.550
-
[76]
Q u ALITY : Q uestion A nswering with L ong I nput T exts, Y es!
Pang, Richard Yuanzhe and Parrish, Alicia and Joshi, Nitish and Nangia, Nikita and Phang, Jason and Chen, Angelica and Padmakumar, Vishakh and Ma, Johnny and Thompson, Jana and He, He and Bowman, Samuel. Q u ALITY : Q uestion A nswering with L ong I nput T exts, Y es!. Proceedings of the 2022 Conference of the North American Chapter of the Association for...
work page 2022
-
[77]
K now W hat Y ou D on ' t K now: U nanswerable Q uestions for SQ u AD
Rajpurkar, Pranav and Jia, Robin and Liang, Percy. K now W hat Y ou D on ' t K now: U nanswerable Q uestions for SQ u AD. Association for Computational Linguistics (ACL). 2018
work page 2018
-
[78]
Yu, Wenhao and Iter, Dan and Wang, Shuohang and Xu, Yichong and Ju, Mingxuan and Sanyal, Soumya and Zhu, Chenguang and Zeng, Michael and Jiang, Meng , url=
-
[79]
Robertson, Stephen and Zaragoza, Hugo and others , journal=. The. 2009 , publisher=
work page 2009
-
[80]
Advances in Neural Information Processing Systems , volume=
Lewis, Patrick and Perez, Ethan and Piktus, Aleksandra and Petroni, Fabio and Karpukhin, Vladimir and Goyal, Naman and K. Advances in Neural Information Processing Systems , volume=. 2020 , url=
work page 2020
-
[81]
Guu, Kelvin and Lee, Kenton and Tung, Zora and Pasupat, Panupong and Chang, Mingwei , booktitle=. 2020 , organization=
work page 2020
-
[82]
arXiv preprint arXiv:2304.06762 , year=
Shall we pretrain autoregressive language models with retrieval? a comprehensive study , author=. arXiv preprint arXiv:2304.06762 , year=
-
[83]
Ko. The. Transactions of the Association for Computational Linguistics , volume=. 2018 , publisher=
work page 2018
-
[84]
IEEE Transactions on Big Data , volume=
Johnson, Jeff and Douze, Matthijs and J. IEEE Transactions on Big Data , volume=. 2019 , publisher=
work page 2019
-
[85]
ROUGE : A P ackage for A utomatic E valuation of S ummaries
Lin, Chin-Yew. ROUGE : A P ackage for A utomatic E valuation of S ummaries. Text Summarization Branches Out. 2004
work page 2004
-
[86]
Colin Raffel and Noam Shazeer and Adam Roberts and Katherine Lee and Sharan Narang and Michael Matena and Yanqi Zhou and Wei Li and Peter J. Liu , title =. Journal of Machine Learning Research , year =
-
[87]
Radford, Alec and Wu, Jeffrey and Child, Rewon and Luan, David and Amodei, Dario and Sutskever, Ilya and others , journal=. 2019 , url=
work page 2019
- [88]
-
[89]
Wei, Jason and Bosma, Maarten and Zhao, Vincent Y and Guu, Kelvin and Yu, Adams Wei and Lester, Brian and Du, Nan and Dai, Andrew M and Le, Quoc V , year=
-
[90]
Jin, Di and Pan, Eileen and Oufattole, Nassim and Weng, Wei-Hung and Fang, Hanyi and Szolovits, Peter , journal=. 2021 , publisher=
work page 2021
-
[91]
Yasunaga, Michihiro and Leskovec, Jure and Liang, Percy , year=
-
[92]
Sp. A. Journal of documentation , volume=. 1972 , publisher=
work page 1972
-
[93]
Robertson, Stephen E and Walker, Steve and Jones, Susan and Hancock-Beaulieu, Micheline M and Gatford, Mike and others , journal=. 1995 , publisher=
work page 1995
-
[94]
perezal, Charu C and Hinneburg, Alexander and Keim, Daniel A , booktitle=. 2001 , organization=
work page 2001
-
[95]
McInnes, Leland and Healy, John and Melville, James , year=
-
[96]
Grootendorst, Maarten , year=
-
[97]
UNIFIEDQA : Crossing Format Boundaries with a Single QA System
Khashabi, Daniel and Min, Sewon and Khot, Tushar and Sabharwal, Ashish and Tafjord, Oyvind and Clark, Peter and Hajishirzi, Hannaneh. UNIFIEDQA : Crossing Format Boundaries with a Single QA System. Findings of the Association for Computational Linguistics: EMNLP 2020. 2020. doi:10.18653/v1/2020.findings-emnlp.171
-
[98]
Ji, Ziwei and Lee, Nayeon and Frieske, Rita and Yu, Tiezheng and Su, Dan and Xu, Yan and Ishii, Etsuko and Bang, Ye Jin and Madotto, Andrea and Fung, Pascale , title =. 2023 , url =. doi:10.1145/3571730 , journal =
-
[99]
Gautier Izacard and Edouard Grave , year=. 2007.01282 , archivePrefix=
-
[100]
Karan Singhal and Shekoofeh Azizi and Tao Tu and S. Sara Mahdavi and Jason Wei and Hyung Won Chung and Nathan Scales and Ajay Tanwani and Heather Cole-Lewis and Stephen Pfohl and Perry Payne and Martin Seneviratne and Paul Gamble and Chris Kelly and Nathaneal Scharli and Aakanksha Chowdhery and Philip Mansfield and Blaise Aguera y Arcas and Dale Webster a...
-
[101]
Gautier Izacard and Edouard Grave , year=. 2012.04584 , archivePrefix=
-
[102]
Capabilities of GPT-4 on Medical Challenge Problems
Harsha Nori and Nicholas King and Scott Mayer McKinney and Dean Carignan and Eric Horvitz , year=. 2303.13375 , archivePrefix=
work page internal anchor Pith review arXiv
-
[103]
Dong, Mengxing and Zou, Bowei and Li, Yanling and Hong, Yu , booktitle=. Co. 2023 , organization=
work page 2023
-
[104]
Longformer: The Long-Document Transformer
Iz Beltagy and Matthew E. Peters and Arman Cohan , year=. 2004.05150 , archivePrefix=
work page internal anchor Pith review Pith/arXiv arXiv 2004
-
[105]
Frustratingly Hard Evidence Retrieval for QA Over Books
Mou, Xiangyang and Yu, Mo and Yao, Bingsheng and Yang, Chenghao and Guo, Xiaoxiao and Potdar, Saloni and Su, Hui. Frustratingly Hard Evidence Retrieval for QA Over Books. Proceedings of the First Joint Workshop on Narrative Understanding, Storylines, and Events. 2020. doi:10.18653/v1/2020.nuse-1.13
-
[106]
arXiv preprint arXiv:2112.01488 , year=
Colbertv2: Effective and efficient retrieval via lightweight late interaction , author=. arXiv preprint arXiv:2112.01488 , year=
-
[107]
Khattab, Omar and Zaharia, Matei , booktitle=. Col. 2020 , url=
work page 2020
-
[108]
Questions Are All You Need to Train a Dense Passage Retriever
Sachan, Devendra Singh and Lewis, Mike and Yogatama, Dani and Zettlemoyer, Luke and Pineau, Joelle and Zaheer, Manzil. Questions Are All You Need to Train a Dense Passage Retriever. Transactions of the Association for Computational Linguistics. 2023. doi:10.1162/tacl_a_00564
-
[109]
Sentence- BERT : Sentence Embeddings using S iamese BERT -Networks
Reimers, Nils and Gurevych, Iryna. Sentence- BERT : Sentence Embeddings using S iamese BERT -Networks. Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP). 2019. doi:10.18653/v1/D19-1410
-
[110]
Yury Zemlyanskiy and Joshua Ainslie and Michiel de Jong and Philip Pham and Ilya Eckstein and Fei Sha , year=. 2105.04241 , archivePrefix=
-
[111]
Ziegler and Nisan Stiennon and Ryan Lowe and Jan Leike and Paul Christiano , year=
Jeff Wu and Long Ouyang and Daniel M. Ziegler and Nisan Stiennon and Ryan Lowe and Jan Leike and Paul Christiano , year=. 2109.10862 , archivePrefix=
-
[112]
Michihiro Yasunaga and Antoine Bosselut and Hongyu Ren and Xikun Zhang and Christopher D Manning and Percy Liang and Jure Leskovec , year=. 2210.09338 , archivePrefix=
-
[113]
R eading W ikipedia to A nswer O pen- D omain Q uestions. Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 2017. doi:10.18653/v1/P17-1171
-
[114]
Manning and Kyoung-Gu Woo , year=
Haejun Lee and Akhil Kedia and Jongwon Lee and Ashwin Paranjape and Christopher D. Manning and Kyoung-Gu Woo , year=. 2112.07381 , archivePrefix=
-
[115]
Zhengbao Jiang and Luyu Gao and Jun Araki and Haibo Ding and Zhiruo Wang and Jamie Callan and Graham Neubig , year=. 2212.02027 , archivePrefix=
-
[116]
Aggarwal, Charu C and Hinneburg, Alexander and Keim, Daniel A , booktitle=. On the. 2001 , organization=
work page 2001
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.