Recognition: 2 theorem links
· Lean TheoremMitigating Hallucination on Hallucination in RAG via Ensemble Voting
Pith reviewed 2026-05-14 22:41 UTC · model grok-4.3
The pith
VOTE-RAG reduces compounded RAG hallucinations by voting across multiple retrieval queries and independent answers.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
VOTE-RAG is a two-stage ensemble voting framework that first aggregates documents through parallel retrieval voting with diverse queries and then resolves answers through response voting by majority among independently generated outputs, achieving performance comparable to or surpassing more complex frameworks on six benchmark datasets.
What carries the argument
Two-stage voting mechanism: retrieval voting pools documents from multiple parallel diverse queries, followed by response voting that selects the majority answer from independent generations based on the pooled documents.
If this is right
- Performance matches or exceeds that of more complex RAG frameworks on six standard benchmarks.
- The architecture stays simpler and remains fully parallelizable, avoiding sequential refinement steps.
- Problem drift risk disappears because the original query is never altered during the process.
- No training or fine-tuning is required, allowing direct use with existing models.
Where Pith is reading between the lines
- The same voting pattern could be applied to reduce other LLM consistency failures outside retrieval settings.
- Increasing the number of parallel agents may improve accuracy further provided compute budgets allow.
- The method can wrap existing RAG pipelines with only minor changes to query and answer generation calls.
Load-bearing premise
Majority voting among independently generated responses will reliably select the correct answer when the retrieved documents contain misleading content that could prompt consistent hallucinations.
What would settle it
A controlled run on any of the six benchmarks in which a majority of agents produce the same incorrect answer while a minority produces the correct one, causing the final vote to select the error.
Figures
read the original abstract
Retrieval-Augmented Generation (RAG) aims to reduce hallucinations in Large Language Models (LLMs) by integrating external knowledge. However, RAG introduces a critical challenge: hallucination on hallucination," where flawed retrieval results mislead the generation model, leading to compounded hallucinations. To address this issue, we propose VOTE-RAG, a novel, training-free framework with a two-stage structure and efficient, parallelizable voting mechanisms. VOTE-RAG includes: (1) Retrieval Voting, where multiple agents generate diverse queries in parallel and aggregate all retrieved documents; (2) Response Voting, where multiple agents independently generate answers based on the aggregated documents, with the final output determined by majority vote. We conduct comparative experiments on six benchmark datasets. Our results show that VOTE-RAG achieves performance comparable to or surpassing more complex frameworks. Additionally, VOTE-RAG features a simpler architecture, is fully parallelizable, and avoids the problem drift" risk. Our work demonstrates that simple, reliable ensemble voting is a superior and more efficient method for mitigating RAG hallucinations.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper proposes VOTE-RAG, a training-free two-stage ensemble framework to mitigate 'hallucination on hallucination' in RAG. Stage 1 (Retrieval Voting) generates diverse queries in parallel and aggregates retrieved documents; Stage 2 (Response Voting) has multiple agents independently generate answers from the aggregated context and selects the majority-vote output. The central claim is that this achieves performance comparable to or better than more complex methods on six benchmarks while remaining simpler, fully parallelizable, and free of problem-drift risk.
Significance. If the empirical claims hold with proper controls and statistics, the work would demonstrate that lightweight, training-free majority voting can reliably outperform or match elaborate RAG variants, offering a practical, scalable baseline for hallucination mitigation that emphasizes simplicity and reproducibility.
major comments (3)
- [Abstract / Experiments] Abstract and Experiments section: comparative results on six benchmarks are asserted without any reported metrics, baselines, error bars, statistical significance tests, or per-dataset tables, so the performance claim cannot be evaluated and is load-bearing for the entire contribution.
- [Section 3.2] Response Voting description (Section 3.2): the mechanism assumes the correct answer remains the mode even when retrieval contains misleading documents, yet no agreement rates, tie-resolution procedure, number of agents, or error-case analysis (e.g., instances where consistent hallucinations outvote the truth) are provided; this directly tests the core 'hallucination on hallucination' mitigation hypothesis.
- [Section 3 / Experiments] Method and Experiments: no specification of how query diversity is generated, how many agents are used, or how the aggregated document set is truncated, all of which affect both the parallelizability claim and the reproducibility of the reported gains.
minor comments (1)
- [Abstract] The phrase 'problem drift' risk is placed in quotes in the abstract but never defined or contrasted with the proposed method.
Simulated Author's Rebuttal
We thank the referee for the constructive comments, which highlight important gaps in the presentation of our results and implementation details. We will revise the manuscript to address each point, adding the necessary tables, specifications, and analyses to strengthen the empirical support and reproducibility of VOTE-RAG.
read point-by-point responses
-
Referee: [Abstract / Experiments] Abstract and Experiments section: comparative results on six benchmarks are asserted without any reported metrics, baselines, error bars, statistical significance tests, or per-dataset tables, so the performance claim cannot be evaluated and is load-bearing for the entire contribution.
Authors: We agree that the experimental claims require fuller documentation to be evaluable. In the revised manuscript we will insert a main results table (and appendix tables) reporting exact metrics for VOTE-RAG and every baseline on each of the six datasets, include standard-error bars from multiple runs, and add paired statistical significance tests (e.g., McNemar or t-tests) against the strongest baselines. revision: yes
-
Referee: [Section 3.2] Response Voting description (Section 3.2): the mechanism assumes the correct answer remains the mode even when retrieval contains misleading documents, yet no agreement rates, tie-resolution procedure, number of agents, or error-case analysis (e.g., instances where consistent hallucinations outvote the truth) are provided; this directly tests the core 'hallucination on hallucination' mitigation hypothesis.
Authors: We will expand Section 3.2 with the missing specifications: number of response agents (5), tie-resolution rule (highest average token-level confidence, else random among tied answers), and per-instance agreement rates. We will also add a dedicated error-analysis subsection that quantifies cases in which consistent hallucinations outvote the correct answer, thereby directly testing the core hypothesis. revision: yes
-
Referee: [Section 3 / Experiments] Method and Experiments: no specification of how query diversity is generated, how many agents are used, or how the aggregated document set is truncated, all of which affect both the parallelizability claim and the reproducibility of the reported gains.
Authors: We accept that these implementation details are required for reproducibility. The revision will specify: (i) query diversity generation via prompt paraphrasing and temperature sampling, (ii) exact agent counts (4 for retrieval voting, 5 for response voting), and (iii) truncation of the aggregated document pool to the top-10 unique passages after deduplication. These additions will also clarify the parallel execution schedule. revision: yes
Circularity Check
No circularity: VOTE-RAG is a direct procedural ensemble method with no derivation chain
full rationale
The paper describes VOTE-RAG as a training-free, two-stage procedural framework consisting of parallel query generation for retrieval aggregation followed by independent response generation and majority voting. No equations, fitted parameters, ansatzes, or first-principles derivations are present. Performance claims rest on empirical benchmark comparisons rather than any 'prediction' that reduces to the method's own inputs by construction. No self-citations, uniqueness theorems, or load-bearing references to prior author work are invoked to justify core steps. The central mechanism (majority vote over LLM outputs) is presented as an explicit algorithmic choice, not derived from or equivalent to its own outputs. This is a standard non-circular empirical proposal.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Majority vote among independent generations selects the non-hallucinated answer when retrieval is flawed.
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclearVOTE-RAG includes: (1) Retrieval Voting, where multiple agents generate diverse queries in parallel and aggregate all retrieved documents; (2) Response Voting, where multiple agents independently generate answers based on the aggregated documents, with the final output determined by majority vote.
-
IndisputableMonolith/Foundation/RealityFromDistinction.leanreality_from_one_distinction unclearsimple, reliable ensemble voting is a superior and more efficient method for mitigating RAG hallucinations
Forward citations
Cited by 1 Pith paper
-
Agentic Retrieval-Augmented Generation for Financial Document Question Answering
FinAgent-RAG achieves 76.81-78.46% execution accuracy on financial QA benchmarks by combining contrastive retrieval, program-of-thought code generation, and adaptive strategy routing, outperforming baselines by 5.62-9...
Reference graph
Works this paper leans on
-
[1]
J. Achiam, S. Adler, S. Agarwal, L. Ahmad, I. Akkaya, F. L. Aleman, D. Almeida, J. Altenschmidt, S. Altman, S. Anadkatet al., “Gpt-4 technical report,”arXiv preprint arXiv:2303.08774, 2023. [Online]. Available: https://arxiv.org/abs/2303.08774
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[2]
Llama 2: Open Foundation and Fine-Tuned Chat Models
H. Touvron, L. Martin, K. Stone, P. Albert, A. Almahairi, Y . Babaei, N. Bashlykov, S. Batra, P. Bhargava, S. Bhosale et al., “Llama 2: Open foundation and fine-tuned chat models,”arXiv preprint arXiv:2307.09288, 2023. [Online]. Available: https://arxiv.org/abs/2307.09288
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[3]
Towards transparent ai: A survey on explainable large language models,
A. Palikhe, Z. Yu, Z. Wang, and W. Zhang, “Towards transparent ai: A survey on explainable large language models,”arXiv preprint arXiv:2506.21812, 2025
-
[4]
Survey of hallucination in natural language generation,
Z. Ji, N. Lee, R. Frieske, T. Yu, D. Su, Y . Xu, E. Ishii, Y . J. Bang, A. Madotto, and P. Fung, “Survey of hallucination in natural language generation,”ACM Computing Surveys, vol. 55, no. 12, pp. 1–38, 2023. [Online]. Available: https://dl.acm.org/doi/10.1145/3571730
-
[5]
L. Huang, W. Yu, W. Ma, W. Zhong, Z. Feng, H. Wang, Q. Chen, W. Peng, X. Feng, B. Qinet al., “A survey on hallucination in large language models: Principles, taxonomy, challenges, and open questions,”ACM Transactions on Information Systems, 2024. [Online]. Available: https://dl.acm.org/doi/10.1145/3703155
-
[6]
Dinov3-powered multi- task foundation model for quantitative remote sensing estimation,
Z. Yu, M. Y . I. Idris, P. Wang, and R. Qureshi, “Dinov3-powered multi- task foundation model for quantitative remote sensing estimation,”AAAI 2026, vol. 40, no. 48, pp. 41 455–41 456, 2026
work page 2026
-
[7]
Retrieval-Augmented Generation for Large Language Models: A Survey
Y . Gao, Y . Xiong, X. Gao, K. Jia, J. Pan, Y . Bi, Y . Dai, J. Sun, and H. Wang, “Retrieval-augmented generation for large language models: A survey,”arXiv preprint arXiv:2312.10997, 2023. [Online]. Available: https://arxiv.org/abs/2312.10997
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[8]
Removal of hallucination on hallucination: Debate-augmented RAG,
W. Hu, W. Zhang, Y . Jiang, C. J. Zhang, X. Wei, and L. Qing, “Removal of hallucination on hallucination: Debate-augmented RAG,” inProceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Vienna, Austria: Association for Computational Linguistics, Jul. 2025, pp. 15 839–15 853. [Online]. Available: ht...
work page 2025
-
[9]
Reasoning in computer vi- sion: Taxonomy, models, tasks, and methodologies,
A. Sarkar, M. Y . I. Idris, and Z. Yu, “Reasoning in computer vi- sion: Taxonomy, models, tasks, and methodologies,”arXiv preprint arXiv:2508.10523, 2025
-
[10]
Chat-driven text generation and interaction for person retrieval,
Z. Xie, C. Wang, Y . Wang, S. Cai, S. Wang, and T. Jin, “Chat-driven text generation and interaction for person retrieval,” inProceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, 2025, pp. 5259–5270
work page 2025
-
[11]
Conquer: Context-aware representation with query enhancement for text-based person search,
Z. Xie, “Conquer: Context-aware representation with query enhancement for text-based person search,”arXiv preprint arXiv:2601.18625, 2026
-
[12]
Z. Yu and C. S. Chan, “Yielding unblemished aesthetics through a unified network for visual imperfections removal in generated images,” AAAI 2025, vol. 39, no. 9, pp. 9716–9724, 2025
work page 2025
-
[13]
Z. Yu, J. Wang, H. Chen, and M. Y . I. Idris, “Qrs-trs: Style transfer-based image-to-image translation for carbon stock estimation in quantitative remote sensing,”IEEE Access, 2025
work page 2025
-
[14]
Hvd: Human vision- driven video representation learning for text-video retrieval,
Z. Xie, X. Liu, B. Zhang, Y . Lin, S. Cai, and T. Jin, “Hvd: Human vision- driven video representation learning for text-video retrieval,”arXiv preprint arXiv:2601.16155, 2026
-
[15]
Delving deeper: Hierarchi- cal visual perception for robust video-text retrieval,
Z. Xie, B. Zhang, Y . Lin, and T. Jin, “Delving deeper: Hierarchi- cal visual perception for robust video-text retrieval,”arXiv preprint arXiv:2601.12768, 2026
-
[16]
H. Trivedi, N. Balasubramanian, T. Khot, and A. Sabharwal, “Interleaving retrieval with chain-of-thought reasoning for knowledge- intensive multi-step questions,” inProceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Toronto, Canada: Association for Computational Linguistics, Jul. 2023, pp. 10 0...
work page 2023
-
[17]
Enhancing retrieval-augmented large language models with iterative retrieval-generation synergy,
Z. Shao, Y . Gong, Y . Shen, M. Huang, N. Duan, and W. Chen, “Enhancing retrieval-augmented large language models with iterative retrieval-generation synergy,” inFindings of the Association for Computational Linguistics: EMNLP 2023, H. Bouamor, J. Pino, and K. Bali, Eds. Singapore: Association for Computational Linguistics, Dec. 2023, pp. 9248–9274. [Onli...
work page 2023
-
[18]
Retrieval-generation synergy augmented large language models,
Z. Feng, X. Feng, D. Zhao, M. Yang, and B. Qin, “Retrieval-generation synergy augmented large language models,” inICASSP 2024-2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2024, pp. 11 661–11 665. [Online]. Available: https://ieeexplore.ieee.org/document/10448015
-
[19]
Self-RAG: Learning to Retrieve, Generate, and Critique through Self-Reflection
A. Asai, Z. Wu, Y . Wang, A. Sil, and H. Hajishirzi, “Self- rag: Learning to retrieve, generate, and critique through self- reflection,”arXiv preprint arXiv:2310.11511, 2023. [Online]. Available: https://arxiv.org/abs/2310.11511
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[20]
Forgetme: Benchmarking the selective forgetting capabilities of generative models,
Z. Yu, M. Y . I. Idris, P. Wang, Y . Xia, and Y . Xiang, “Forgetme: Benchmarking the selective forgetting capabilities of generative models,” EAAI, vol. 161, p. 112087, 2025
work page 2025
-
[21]
Debate or vote: Which yields better decisions in multi-agent large language models?
H. K. Choi, X. Zhu, and S. Li, “Debate or vote: Which yields better decisions in multi-agent large language models?” inAdvances in Neural Information Processing Systems, 2025
work page 2025
-
[22]
Spatiotemporal align- ment for remote sensing image recovery via terrain-aware diffusion,
Z. Yu, H. Jiang, P. Wang, Z. Lin, and Y . Xiang, “Spatiotemporal align- ment for remote sensing image recovery via terrain-aware diffusion,” ICASSP 2026, 2026
work page 2026
-
[23]
Z. Yu, M. Y . I. IDRIS, P. Wang, and R. Qureshi, “Cotextor: Training- free modular multilingual text editing via layered disentanglement and depth-aware fusion,” inNeurIPS 2025, 2025
work page 2025
-
[24]
Retrieval augmented language model pre-training,
K. Guu, K. Lee, Z. Tung, P. Pasupat, and M. Chang, “Retrieval augmented language model pre-training,” inInternational conference on machine learning. PMLR, 2020, pp. 3929–3938. [Online]. Available: https://dl.acm.org/doi/abs/10.5555/3524938.3525306
-
[25]
Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks
P. Lewis, E. Perez, A. Piktus, F. Petroni, V . Karpukhin, N. Goyal, H. K ¨uttler, M. Lewis, W.-t. Yih, T. Rockt ¨aschelet al., “Retrieval- augmented generation for knowledge-intensive nlp tasks,”Advances in Neural Information Processing Systems, vol. 33, pp. 9459–9474, 2020. [Online]. Available: https://arxiv.org/abs/2005.11401
work page internal anchor Pith review Pith/arXiv arXiv 2020
-
[26]
Leveraging passage retrieval with generative models for open domain question answering,
G. Izacard and E. Grave, “Leveraging passage retrieval with generative models for open domain question answering,” in Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume, P. Merlo, J. Tiedemann, and R. Tsarfaty, Eds. Online: Association for Computational Linguistics, Apr. 2021, pp. 874–8...
work page 2021
-
[27]
Few-shot learning with retrieval augmented language models,
G. Izacard, P. Lewis, M. Lomeli, L. Hosseini, F. Petroni, T. Schick, J. Dwivedi-Yu, A. Joulin, S. Riedel, and E. Grave, “Few-shot learning with retrieval augmented language models,” arXiv preprint arXiv:2208.03299, 2022. [Online]. Available: https://arxiv.org/abs/2208.03299
-
[28]
REPLUG: Retrieval-augmented black- box language models,
W. Shi, S. Min, M. Yasunaga, M. Seo, R. James, M. Lewis, L. Zettlemoyer, and W.-t. Yih, “REPLUG: Retrieval-augmented black- box language models,” inProceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers), K. Duh, H. Gomez, and S. Bethard, Eds. Me...
work page 2024
-
[29]
Fusing semantics, observability, reliability and diversity of concept detectors for video search,
X.-Y . Wei and C.-W. Ngo, “Fusing semantics, observability, reliability and diversity of concept detectors for video search,” inProceedings of the 16th ACM international conference on Multimedia, 2008, pp. 81–90. [Online]. Available: https://dl.acm.org/doi/10.1145/1459359.1459371
-
[30]
Multi-agent large language models for conversational task- solving,
J. Becker, “Multi-agent large language models for conversational task- solving,”arXiv preprint arXiv:2410.22932, 2024. [Online]. Available: https://arxiv.org/abs/2410.22932
-
[31]
Finecir: Explicit parsing of fine-grained modification semantics for composed image retrieval,
Z. Li, Z. Fu, Y . Hu, Z. Chen, H. Wen, and L. Nie, “Finecir: Explicit parsing of fine-grained modification semantics for composed image retrieval,”https://arxiv.org/abs/2503.21309, 2025
-
[32]
Encoder: Entity mining and modification relation binding for composed image retrieval,
Z. Li, Z. Chen, H. Wen, Z. Fu, Y . Hu, and W. Guan, “Encoder: Entity mining and modification relation binding for composed image retrieval,” inProceedings of the AAAI Conference on Artificial Intelligence, vol. 39, no. 5, 2025, pp. 5101–5109
work page 2025
-
[33]
Hud: Hierar- chical uncertainty-aware disambiguation network for composed video retrieval,
Z. Chen, Y . Hu, Z. Li, Z. Fu, H. Wen, and W. Guan, “Hud: Hierar- chical uncertainty-aware disambiguation network for composed video retrieval,” inProceedings of the ACM International Conference on Multimedia, 2025, p. 6143–6152
work page 2025
-
[34]
Intent: Invariance and discrimination-aware noise mitigation for robust composed image retrieval,
Z. Chen, Y . Hu, Z. Fu, Z. Li, J. Huang, Q. Huang, and Y . Wei, “Intent: Invariance and discrimination-aware noise mitigation for robust composed image retrieval,” inProceedings of the AAAI Conference on Artificial Intelligence, vol. 40, no. 25, 2026, pp. 20 463–20 471
work page 2026
-
[35]
Offset: Segmentation-based focus shift revision for composed image retrieval,
Z. Chen, Y . Hu, Z. Li, Z. Fu, X. Song, and L. Nie, “Offset: Segmentation-based focus shift revision for composed image retrieval,” inProceedings of the ACM International Conference on Multimedia, 2025, p. 6113–6122
work page 2025
-
[36]
Refine: Composed video retrieval via shared and differential semantics enhance- ment,
Y . Hu, Z. Li, Z. Chen, Q. Huang, Z. Fu, M. Xu, and L. Nie, “Refine: Composed video retrieval via shared and differential semantics enhance- ment,”ACM Transactions on Multimedia Computing, Communications and Applications, 2026
work page 2026
-
[37]
Active retrieval augmented generation,
Z. Jiang, F. Xu, L. Gao, Z. Sun, Q. Liu, J. Dwivedi-Yu, Y . Yang, J. Callan, and G. Neubig, “Active retrieval augmented generation,” inProceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, H. Bouamor, J. Pino, and K. Bali, Eds. Singapore: Association for Computational Linguistics, Dec. 2023, pp. 7969–7992. [Online]. Avail...
work page 2023
-
[38]
Natural questions: A benchmark for question answering research,
T. Kwiatkowski, J. Palomaki, O. Redfield, M. Collins, A. Parikh, C. Alberti, D. Epstein, I. Polosukhin, J. Devlin, K. Lee, K. Toutanova, L. Jones, M. Kelcey, M.-W. Chang, A. M. Dai, J. Uszkoreit, Q. Le, and S. Petrov, “Natural questions: A benchmark for question answering research,”Transactions of the Association for Computational Linguistics, vol. 7, pp....
work page 2019
-
[39]
TriviaQA: A large scale distantly supervised challenge dataset for reading comprehension,
M. Joshi, E. Choi, D. Weld, and L. Zettlemoyer, “TriviaQA: A large scale distantly supervised challenge dataset for reading comprehension,” inProceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), R. Barzilay and M.-Y . Kan, Eds. Vancouver, Canada: Association for Computational Linguistics, Jul. 20...
work page 2017
-
[40]
A. Mallen, A. Asai, V . Zhong, R. Das, D. Khashabi, and H. Hajishirzi, “When not to trust language models: Investigating effectiveness of parametric and non-parametric memories,” in Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), A. Rogers, J. Boyd-Graber, and N. Okazaki, Eds. Toronto, Canad...
work page 2023
-
[41]
Constructing a multi-hop QA dataset for comprehensive evaluation of reasoning steps,
X. Ho, A.-K. Duong Nguyen, S. Sugawara, and A. Aizawa, “Constructing a multi-hop QA dataset for comprehensive evaluation of reasoning steps,” inProceedings of the 28th International Conference on Computational Linguistics, D. Scott, N. Bel, and C. Zong, Eds. Barcelona, Spain (Online): International Committee on Computational Linguistics, Dec. 2020, pp. 66...
work page 2020
-
[42]
HotpotQA: A dataset for diverse, explainable multi-hop question answering,
Z. Yang, P. Qi, S. Zhang, Y . Bengio, W. Cohen, R. Salakhutdinov, and C. D. Manning, “HotpotQA: A dataset for diverse, explainable multi-hop question answering,” inProceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, E. Riloff, D. Chiang, J. Hockenmaier, and J. Tsujii, Eds. Brussels, Belgium: Association for Computationa...
work page 2018
-
[43]
Did aristotle use a laptop? a question answering benchmark with implicit reasoning strategies,
M. Geva, D. Khashabi, E. Segal, T. Khot, D. Roth, and J. Berant, “Did aristotle use a laptop? a question answering benchmark with implicit reasoning strategies,”Transactions of the Association for Computational Linguistics, vol. 9, pp. 346–361, 2021. [Online]. Available: https://aclanthology.org/2021.tacl-1.21/
work page 2021
-
[44]
Improving factuality and reasoning in language models through multiagent debate,
Y . Du, S. Li, A. Torralba, J. B. Tenenbaum, and I. Mordatch, “Improving factuality and reasoning in language models through multiagent debate,” inProceedings of the 41st International Conference on Machine Learning, ser. ICML’24. JMLR.org, 2024. [Online]. Available: https://dl.acm.org/doi/10.5555/3692070.3692537
-
[45]
Sure: Summarizing retrievals using answer candidates for open-domain qa of llms,
J. Kim, J. Nam, S. Mo, J. Park, S.-W. Lee, M. Seo, J.-W. Ha, and J. Shin, “Sure: Summarizing retrievals using answer candidates for open-domain qa of llms,”arXiv preprint arXiv:2404.13081, 2024. [Online]. Available: https://arxiv.org/abs/2404.13081
-
[46]
A. Dubey, A. Jauhri, A. Pandey, A. Kadian, A. Al-Dahle, A. Letman, A. Mathur, A. Schelten, A. Yang, A. Fanet al., “The llama 3 herd of models,”arXiv preprint arXiv:2407.21783, 2024. [Online]. Available: https://arxiv.org/abs/2407.21783
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[47]
Flashrag: A modular toolkit for efficient retrieval-augmented generation research,
J. Jin, Y . Zhu, X. Yang, C. Zhang, and Z. Dou, “Flashrag: A modular toolkit for efficient retrieval-augmented generation research,”CoRR, vol. abs/2405.13576, 2024. [Online]. Available: https://arxiv.org/abs/2405.13576
-
[48]
Text Embeddings by Weakly-Supervised Contrastive Pre-training
L. Wang, N. Yang, X. Huang, B. Jiao, L. Yang, D. Jiang, R. Majumder, and F. Wei, “Text embeddings by weakly-supervised contrastive pre- training,”arXiv preprint arXiv:2212.03533, 2022. [Online]. Available: https://arxiv.org/abs/2212.03533
work page internal anchor Pith review Pith/arXiv arXiv 2022
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.