arxiv: 2605.01302 · v1 · submitted 2026-05-02 · 💻 cs.CL · cs.IR

Recognition: unknown

Beyond Semantic Relevance: Counterfactual Risk Minimization for Robust Retrieval-Augmented Generation

Peiyang Liu , Qiang Yan , Ziqiang Cui , Di Liang , Xi Wang , Wei Ye

Authors on Pith no claims yet

Pith reviewed 2026-05-09 15:01 UTC · model grok-4.3

classification 💻 cs.CL cs.IR

keywords retrieval augmented generationcounterfactual risk minimizationcognitive biasesrobust retrievaldecision-makingEvidence Criticadversarial robustness

0 comments

The pith

CoRM-RAG aligns RAG retrieval with decision safety by minimizing counterfactual risk from simulated cognitive biases rather than maximizing semantic relevance.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Retrieval-augmented generation systems usually pick documents that match the query semantically, but this breaks down when users hold false premises or seek confirmation of their biases. In those cases, relevant documents can reinforce errors and increase hallucinations instead of correcting them. CoRM-RAG addresses this by creating training data with simulated cognitive perturbations and training an Evidence Critic to favor documents that support correct outcomes even under those perturbations. If successful, this shifts the goal of retrieval from matching what the user says to enabling safe decisions despite how the user thinks. The result is better performance on adversarial decision tasks and the ability to abstain when no document provides robust enough evidence.

Core claim

Standard semantic relevance creates a Relevance-Robustness Gap because it favors sycophantic evidence that reinforces hallucinations when queries contain cognitive biases. CoRM-RAG counters this through causal intervention: a Cognitive Perturbation Protocol simulates biases such as false premises and confirmation bias to generate training perturbations. These are used to distill an Evidence Critic that scores documents according to their capacity to support correct decisions despite the perturbations. This yields superior results on decision-making benchmarks under adversarial conditions and permits abstention based on robustness scores.

What carries the argument

The Cognitive Perturbation Protocol, which generates query perturbations simulating user biases, distilled into an Evidence Critic that scores documents for evidential strength under those perturbations.

If this is right

Outperforms strong dense retrievers and LLM-based rerankers in adversarial decision-making settings.
Enables effective risk-aware abstention through reliable robustness scoring from the Evidence Critic.
Aligns retrieval selection with documents that maintain decision correctness despite query perturbations.
Allows distillation of the risk minimization into a lightweight module without requiring full model retraining.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

This method could be adapted to improve robustness in other generative AI tasks where input biases lead to sycophantic outputs.
The perturbation protocol might serve as a template for testing other causal assumptions in retrieval systems.
Integration with existing RAG pipelines could provide a plug-in safety layer for current systems.

Load-bearing premise

The Cognitive Perturbation Protocol generates training examples that match the distribution of cognitive biases, such as false premises and confirmation bias, that real users introduce in decision-making queries.

What would settle it

A direct comparison on a dataset of real user queries with verified cognitive biases, checking if CoRM-RAG's selected documents lead to fewer hallucinations than relevance-based retrieval and if the robustness scores predict actual decision errors.

Figures

Figures reproduced from arXiv: 2605.01302 by Di Liang, Peiyang Liu, Qiang Yan, Wei Ye, Xi Wang, Ziqiang Cui.

**Figure 1.** Figure 1: The Relevance-Robustness Gap. Left: Standard RAG view at source ↗

**Figure 2.** Figure 2: The CoRM-RAG Framework. The pipeline consists of two phases: (1) Counterfactual Training (Top): We apply a view at source ↗

**Figure 3.** Figure 3: Risk-Coverage Analysis on Biased-NQ. We plot view at source ↗

**Figure 4.** Figure 4: Retrieval Quality on Biased-NQ. (a) Recall@ view at source ↗

**Figure 5.** Figure 5: Ablation Study on Cognitive Perturbation Types. view at source ↗

**Figure 6.** Figure 6: Results of Hyperparameter Analysis. 5 25 100 500 1k ... Average Latency per Query (ms) [Log Scale] 0 10 20 30 40 50 60 Accuracy on Biased-NQ (%) High Latency (Not Viable) Lower Latency Better Accuracy CoRM-RAG (Ours) Standard Retrieval Dense Retrieval Cross-Encoder Generative Methods view at source ↗

**Figure 7.** Figure 7: Efficiency-Performance Pareto Frontier on Biased view at source ↗

read the original abstract

Standard Retrieval-Augmented Generation (RAG) systems predominantly rely on semantic relevance as a proxy for utility. However, this assumption collapses in realistic decision-making scenarios where user queries are laden with cognitive biases, such as false premises or confirmation bias. In such cases, maximizing relevance paradoxically promotes the retrieval of sycophantic evidence that reinforces hallucinations, a critical failure we term the ``Relevance-Robustness Gap''. To bridge this gap, we propose CoRM-RAG (Counterfactual Risk Minimization for RAG), a framework that aligns retrieval with decision safety rather than mere similarity. Grounded in causal intervention, we introduce a Cognitive Perturbation Protocol to simulate user biases during training, which is then distilled into a lightweight Evidence Critic. This scoring module learns to identify documents that possess sufficient evidential strength to steer the model toward correctness despite adversarial query perturbations. Extensive experiments on decision-making benchmarks demonstrate that CoRM-RAG significantly outperforms strong dense retrievers and LLM-based rerankers in adversarial settings, while enabling effective risk-aware abstention through reliable robustness scoring. Our code is available at https://github.com/PeiYangLiu/CoRM-RAG.git.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

CoRM-RAG targets a real RAG failure mode with counterfactual training, but the perturbation protocol lacks evidence it matches actual user biases.

read the letter

The paper's central move is to stop treating semantic relevance as the right objective in RAG when queries carry cognitive biases. Instead it trains retrieval via counterfactual risk minimization so the system prefers evidence that still supports correct answers even after the query is perturbed. They package this as CoRM-RAG, add a Cognitive Perturbation Protocol to generate the training perturbations, and distill the result into a lightweight Evidence Critic that scores robustness and supports abstention. Code is released, which is useful for anyone who wants to inspect the implementation.

Referee Report

2 major / 1 minor

Summary. The manuscript identifies a 'Relevance-Robustness Gap' in standard RAG systems, where semantic similarity maximization retrieves sycophantic evidence under cognitively biased queries (false premises, confirmation bias). It introduces CoRM-RAG, which applies counterfactual risk minimization, generates training perturbations via a Cognitive Perturbation Protocol, and distills them into a lightweight Evidence Critic that scores evidential strength for robustness. The central claim is that this yields significant outperformance over dense retrievers and LLM rerankers on decision-making benchmarks in adversarial settings, plus reliable robustness scores for risk-aware abstention. Code is released at the cited GitHub repository.

Significance. If the empirical results and transfer assumptions hold, the work could meaningfully advance reliable RAG for decision-making tasks by shifting retrieval objectives from semantic relevance to decision safety. The open-source code release is a clear positive for reproducibility.

major comments (2)

[Abstract] Abstract: the headline claim of outperformance 'significantly outperforms strong dense retrievers and LLM-based rerankers in adversarial settings' is asserted without any accompanying dataset names, statistical tests, ablation results, or effect-size reporting, leaving the central empirical contribution impossible to evaluate from the provided description.
[Framework Description] Framework / Cognitive Perturbation Protocol: the protocol is defined to simulate user biases for training the Evidence Critic, yet no quantitative validation (embedding overlap, bias-type frequency tables, or human judgment studies) is reported comparing protocol outputs to observed real-world biased queries. This assumption is load-bearing for the claim that reported gains will transfer beyond the synthetic perturbations.

minor comments (1)

[Abstract] Abstract: the phrase 'Relevance-Robustness Gap' is introduced as a novel failure mode without citation to prior RAG robustness literature that discusses similar relevance-hallucination issues.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the detailed and constructive report. We address each major comment below, indicating where revisions will be made to strengthen the manuscript.

read point-by-point responses

Referee: [Abstract] Abstract: the headline claim of outperformance 'significantly outperforms strong dense retrievers and LLM-based rerankers in adversarial settings' is asserted without any accompanying dataset names, statistical tests, ablation results, or effect-size reporting, leaving the central empirical contribution impossible to evaluate from the provided description.

Authors: We agree that the abstract would benefit from greater specificity to allow immediate evaluation of the central claims. In the revised version, we will update the abstract to name the decision-making benchmarks (including multi-hop QA and fact-verification tasks under adversarial perturbations), note the statistical significance of the reported gains, and briefly reference the ablation studies and effect sizes detailed in the experimental section. These additions will remain concise given abstract length limits while directing readers to the full results. revision: yes
Referee: [Framework Description] Framework / Cognitive Perturbation Protocol: the protocol is defined to simulate user biases for training the Evidence Critic, yet no quantitative validation (embedding overlap, bias-type frequency tables, or human judgment studies) is reported comparing protocol outputs to observed real-world biased queries. This assumption is load-bearing for the claim that reported gains will transfer beyond the synthetic perturbations.

Authors: We appreciate the referee's emphasis on validating the Cognitive Perturbation Protocol's fidelity to real-world biases. The protocol draws directly from established categories in the cognitive psychology literature, and its utility is supported by the empirical robustness gains across adversarial test conditions. We acknowledge that explicit quantitative comparisons (e.g., embedding overlap or frequency tables against real query logs) were not reported. In the revision, we will add a new analysis subsection presenting embedding similarity metrics, bias-type distributions, and a discussion of transfer assumptions, thereby addressing the load-bearing concern. revision: yes

Circularity Check

0 steps flagged

No circularity: novel framework with independent experimental claims.

full rationale

The paper introduces CoRM-RAG via a Cognitive Perturbation Protocol and Evidence Critic module, but provides no equations, fitting procedures, or derivations that reduce any claimed prediction or result to its inputs by construction. No self-citations are used to justify uniqueness theorems or ansatzes, and no load-bearing step renames a known result or calls a fitted input a prediction. The training targets are defined relative to the new perturbations as part of the proposed method, which is a standard definitional choice for a new framework rather than a tautological reduction. Experiments on benchmarks are presented as external validation of outperformance, keeping the derivation chain self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 2 invented entities

Abstract-only review; no mathematical definitions or implementation details are supplied, so the ledger is populated only from explicitly named new constructs and the stated domain assumption.

axioms (1)

domain assumption Semantic relevance is an inadequate proxy for retrieval utility when user queries contain cognitive biases.
Directly stated as the motivation for the Relevance-Robustness Gap.

invented entities (2)

Cognitive Perturbation Protocol no independent evidence
purpose: Simulate user biases during training to generate adversarial query variants.
New training procedure introduced without external validation or formal definition.
Evidence Critic no independent evidence
purpose: Lightweight scoring module that identifies documents with sufficient evidential strength for correct decisions under perturbations.
New distilled module whose training objective is defined relative to the perturbation protocol.

pith-pipeline@v0.9.0 · 5514 in / 1458 out tokens · 51301 ms · 2026-05-09T15:01:01.614330+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

86 extracted references · 37 canonical work pages · 17 internal anchors

[1]

Shakiba Amirshahi, Amin Bigdeli, Charles LA Clarke, and Amira Ghenai. 2025. Evaluating the Robustness of Retrieval-Augmented Generation to Adversarial Evidence in the Health Domain.arXiv preprint arXiv:2509.03787(2025)

work page arXiv 2025
[2]

Guojia An, Jie Zou, Jiwei Wei, Chaoning Zhang, Fuming Sun, and Yang Yang. 2025. Beyond whole dialogue modeling: Contextual disentanglement for conversational recommendation. InProceedings of the 48th International ACM SIGIR Conference on Research and Development in Information Retrieval. 31–41

2025
[3]

Dang Anh-Hoang, Vu Tran, and Le-Minh Nguyen. 2025. Survey and analysis of hallucinations in large language models: attribution to prompting strategies or model behavior.Frontiers in Artificial Intelligence8 (2025), 1622292

2025
[4]

Akari Asai, Zeqiu Wu, Yizhong Wang, Avirup Sil, and Hannaneh Hajishirzi
[5]

Self-rag: Learning to retrieve, generate, and critique through self-reflection. (2024)

2024
[6]

Rishi Bommasani. 2021. On the opportunities and risks of foundation models. arXiv preprint arXiv:2108.07258(2021)

work page internal anchor Pith review arXiv 2021
[7]

Viola Campos, Robin Kuschnereit, and Adrian Ulges. 2025. Multicalibration for LLM-based Code Generation.arXiv preprint arXiv:2512.08810(2025)

work page arXiv 2025
[8]

Erica Cau, Valentina Pansanella, Dino Pedreschi, and Giulio Rossetti. 2025. Se- lective agreement, not sycophancy: investigating opinion dynamics in LLM interactions.EPJ Data Science14, 1 (2025), 59

2025
[9]

Yingshan Chang, Mridu Narang, Hisami Suzuki, Guihong Cao, Jianfeng Gao, and Yonatan Bisk. 2022. Webqa: Multihop and multimodal qa. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition. 16495–16504

2022
[10]

Jiawei Chen, Hongyu Lin, Xianpei Han, and Le Sun. 2024. Benchmarking large language models in retrieval-augmented generation. InProceedings of the AAAI Conference on Artificial Intelligence, Vol. 38. 17754–17762

2024
[11]

Myra Cheng, Sunny Yu, Cinoo Lee, Pranav Khadpe, Lujain Ibrahim, and Dan Jurafsky. 2025. Social sycophancy: A broader understanding of llm sycophancy. arXiv preprint arXiv:2505.13995(2025)

work page internal anchor Pith review Pith/arXiv arXiv 2025
[12]

Ivan Andre Naranjo Coronel, Can Demircan, and Eric Schulz. [n. d.]. How Does an LLM Process Conflicting Information In-Context? ([n. d.])
[13]

Florin Cuconasu, Giovanni Trappolini, Federico Siciliano, Simone Filice, Cesare Campagnano, Yoelle Maarek, Nicola Tonellotto, and Fabrizio Silvestri. 2024. The power of noise: Redefining retrieval for rag systems. InProceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval. 719–729

2024
[14]

Josef Dai, Xuehai Pan, Ruiyang Sun, Jiaming Ji, Xinbo Xu, Mickel Liu, Yizhou Wang, and Yaodong Yang. 2023. Safe rlhf: Safe reinforcement learning from human feedback.arXiv preprint arXiv:2310.12773(2023)

work page internal anchor Pith review arXiv 2023
[15]

Zhiying Deng, Wei Liu, Jianjun Li, Zhiqiang Guo, Qian Chen, and Juan Zhao
[16]

Behavior-Aware Global-Enhanced Neural Modeling for Sequential Set Recommendation.IEEE Transactions on Artificial Intelligence(2025)

2025
[17]

Haonan Dong, Kehan Jiang, Haoran Ye, Wenhao Zhu, Zhaolu Kang, and Guo- jie Song. 2026. NeuReasoner: Towards Explainable, Controllable, and Unified Reasoning via Mixture-of-Neurons.arXiv preprint arXiv:2604.02972(2026). Beyond Semantic Relevance: Counterfactual Risk Minimization for Robust Retrieval-Augmented Generation SIGIR ’26, July 20–24, 2026, Melbour...

work page internal anchor Pith review Pith/arXiv arXiv 2026
[18]

Haonan Dong, Wenhao Zhu, Guojie Song, and Liang Wang. 2025. Aurora: Breaking low-rank bottleneck of lora with nonlinear mapping.arXiv preprint arXiv:2505.18738(2025)

work page arXiv 2025
[19]

Yangyi Fang, Jiaye Lin, Xiaoliang Fu, Cong Qin, Haolin Shi, Chaowen Hu, Lu Pan, Ke Zeng, and Xunliang Cai. 2026. How to allocate, how to learn? dynamic rollout allocation and advantage modulation for policy optimization.arXiv preprint arXiv:2602.19208(2026)

work page internal anchor Pith review Pith/arXiv arXiv 2026
[20]

Yangyi Fang, Jiaye Lin, Xiaoliang Fu, Cong Qin, Haolin Shi, Chang Liu, and Peilin Zhao. 2026. Proximity-Based Multi-Turn Optimization: Practical Credit Assignment for LLM Agent Training.arXiv preprint arXiv:2602.19225(2026)

work page arXiv 2026
[21]

Aaron Fanous, Jacob Goldberg, Ank Agarwal, Joanna Lin, Anson Zhou, Sonnet Xu, Vasiliki Bikia, Roxana Daneshjou, and Sanmi Koyejo. 2025. Syceval: Evaluating llm sycophancy. InProceedings of the AAAI/ACM Conference on AI, Ethics, and Society, Vol. 8. 893–900

2025
[22]

Xiaoliang Fu, Jiaye Lin, Yangyi Fang, Chaowen Hu, Cong Qin, Zekai Shao, Binbin Zheng, Lu Pan, and Ke Zeng. 2026. From log𝝅 to 𝝅: Taming Divergence in Soft Clipping via Bilateral Decoupled Decay of Probability Gradient Weight.arXiv preprint arXiv:2603.14389(2026)

work page internal anchor Pith review Pith/arXiv arXiv 2026
[23]

Xiaoliang Fu, Jiaye Lin, Yangyi Fang, Binbin Zheng, Chaowen Hu, Zekai Shao, Cong Qin, Lu Pan, Ke Zeng, and Xunliang Cai. 2026. Maspo: Unifying gradient utilization, probability mass, and signal reliability for robust and sample-efficient llm reasoning.arXiv preprint arXiv:2602.17550(2026)

work page internal anchor Pith review Pith/arXiv arXiv 2026
[24]

Kelvin Guu, Kenton Lee, Zora Tung, Panupong Pasupat, and Mingwei Chang. 2020. Retrieval augmented language model pre-training. InInternational conference on machine learning. PMLR, 3929–3938

2020
[25]

Gautier Izacard, Mathilde Caron, Lucas Hosseini, Sebastian Riedel, Piotr Bo- janowski, Armand Joulin, and Edouard Grave. 2021. Unsupervised dense in- formation retrieval with contrastive learning.arXiv preprint arXiv:2112.09118 (2021)

work page internal anchor Pith review arXiv 2021
[26]

Ziwei Ji, Nayeon Lee, Rita Frieske, Tiezheng Yu, Dan Su, Yan Xu, Etsuko Ishii, Ye Jin Bang, Andrea Madotto, and Pascale Fung. 2023. Survey of hallucination in natural language generation.ACM computing surveys55, 12 (2023), 1–38

2023
[27]

Kehan Jiang, Haonan Dong, Zhaolu Kang, Zhengzhou Zhu, and Guojie Song
[28]

Foe: Forest of errors makes the first solution the best in large reasoning models.arXiv preprint arXiv:2604.02967(2026)

work page internal anchor Pith review Pith/arXiv arXiv 2026
[29]

Thorsten Joachims, Adith Swaminathan, and Tobias Schnabel. 2017. Unbiased learning-to-rank with biased feedback. InProceedings of the tenth ACM interna- tional conference on web search and data mining. 781–789

2017
[30]

Sungwon Kim and Daniel Khashabi. 2025. Challenging the Evaluator: LLM Sycophancy Under User Rebuttal.arXiv preprint arXiv:2509.16533(2025)

work page arXiv 2025
[31]

Satyapriya Krishna, Kalpesh Krishna, Anhad Mohananey, Steven Schwarcz, Adam Stambler, Shyam Upadhyay, and Manaal Faruqui. 2025. Fact, fetch, and reason: A unified evaluation of retrieval-augmented generation. InProceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technol...

2025
[32]

Tom Kwiatkowski, Jennimaria Palomaki, Olivia Redfield, Michael Collins, Ankur Parikh, Chris Alberti, Danielle Epstein, Illia Polosukhin, Jacob Devlin, Kenton Lee, et al. 2019. Natural questions: a benchmark for question answering research. Transactions of the Association for Computational Linguistics7 (2019), 453–466

2019
[33]

Patrick Lewis, Ethan Perez, Aleksandra Piktus, Fabio Petroni, Vladimir Karpukhin, Naman Goyal, Heinrich Küttler, Mike Lewis, Wen-tau Yih, Tim Rocktäschel, et al. 2020. Retrieval-augmented generation for knowledge-intensive nlp tasks. Advances in neural information processing systems33 (2020), 9459–9474

2020
[34]

Bo Li, Tian Tian, Zhenghua Xu, Hao Cheng, Shikun Zhang, and Wei Ye. 2026. Modeling Uncertainty Trends for Timely Retrieval in Dynamic RAG. InFortieth AAAI Conference on Artificial Intelligence, Thirty-Eighth Conference on Innovative Applications of Artificial Intelligence, Sixteenth Symposium on Educational Ad- vances in Artificial Intelligence, AAAI 2026...

2026
[35]

Bo Li, Mingda Wang, Gexiang Fang, Shikun Zhang, and Wei Ye. 2026. Retrieval as Generation: A Unified Framework with Self-Triggered Information Planning. arXiv:2604.11407 [cs.CL] https://arxiv.org/abs/2604.11407

work page internal anchor Pith review Pith/arXiv arXiv 2026
[36]

Bo Li, Mingda Wang, Shikun Zhang, and Wei Ye. 2026. Instruction Data Selection via Answer Divergence. arXiv:2604.10448 [cs.CL] https://arxiv.org/abs/2604. 10448

work page internal anchor Pith review Pith/arXiv arXiv 2026
[37]

Bo Li, Shikun Zhang, and Wei Ye. 2026. Data Selection for Multi-turn Dialogue Instruction Tuning. arXiv:2604.07892 [cs.CL] https://arxiv.org/abs/2604.07892

work page internal anchor Pith review Pith/arXiv arXiv 2026
[38]

Mengdi Li, Jiaye Lin, Xufeng Zhao, Wenhao Lu, Peilin Zhao, Stefan Wermter, and Di Wang. 2025. Curriculum-rlaif: Curriculum alignment with reinforcement learning from ai feedback.arXiv preprint arXiv:2505.20075(2025)

work page internal anchor Pith review Pith/arXiv arXiv 2025
[39]

Xiping Li and Jianghong Ma. 2025. AIMCoT: Active Information-driven Mul- timodal Chain-of-Thought for Vision-Language Reasoning.arXiv preprint arXiv:2509.25699(2025)

work page internal anchor Pith review Pith/arXiv arXiv 2025
[40]

Xiping Li, Jianghong Ma, Kangzhe Liu, Shanshan Feng, Haijun Zhang, and Yutong Wang. 2024. Category-based and popularity-guided video game recommendation: a balance-oriented framework. InProceedings of the ACM Web Conference 2024. 3734–3744

2024
[41]

Xiping Li, Aier Yang, Jianghong Ma, Kangzhe Liu, Shanshan Feng, Haijun Zhang, and Yi Zhao. 2026. CPGRec+: A Balance-oriented Framework for Personalized Video Game Recommendations.ACM Transactions on Information Systems44, 3 (2026), 1–44

2026
[42]

Yuqing Li, Jiangnan Li, Zheng Lin, Ziyan Zhou, Junjie Wu, Weiping Wang, Jie Zhou, and Mo Yu. 2025. Mindscape-Aware Retrieval Augmented Generation for Improved Long Context Understanding.arXiv preprint arXiv:2512.17220(2025)

work page arXiv 2025
[43]

Yuqing Li, Jiangnan Li, Mo Yu, Guoxuan Ding, Zheng Lin, Weiping Wang, and Jie Zhou. 2026. Query-focused and Memory-aware Reranker for Long Context Processing.arXiv preprint arXiv:2602.12192(2026)

work page arXiv 2026
[44]

Jiaye Lin, Yifu Guo, Yuzhen Han, Sen Hu, Ziyi Ni, Licheng Wang, Mingguang Chen, Hongzhang Liu, Ronghao Chen, Yangfan He, et al. 2025. Se-agent: Self- evolution trajectory optimization in multi-step reasoning with llm-based agents. arXiv preprint arXiv:2508.02085(2025)

work page arXiv 2025
[45]

Stephanie Lin, Jacob Hilton, and Owain Evans. 2022. Truthfulqa: Measuring how models mimic human falsehoods. InProceedings of the 60th annual meeting of the association for computational linguistics (volume 1: long papers). 3214–3252

2022
[46]

Peiyang Liu. 2024. Unsupervised corrupt data detection for text training.Expert Systems with Applications248 (2024), 123335

2024
[47]

Peiyang Liu, Zhirui Chen, Xi Wang, Di Liang, Youru Li, Zhi Cai, and Wei Ye. 2026. Learning from Contrasts: Synthesizing Reasoning Paths from Diverse Search Trajectories. arXiv:2604.11365 [cs.AI] https://arxiv.org/abs/2604.11365

work page internal anchor Pith review Pith/arXiv arXiv 2026
[48]

Peiyang Liu, Ziqiang Cui, Di Liang, and Wei Ye. 2025. Who Stole Your Data? A Method for Detecting Unauthorized RAG Theft.arXiv preprint arXiv:2510.07728 (2025)

work page arXiv 2025
[49]

Peiyang Liu, Sen Wang, Xi Wang, Wei Ye, and Shikun Zhang. 2021. Quadruplet- BERT: An efficient model for embedding-based large-scale retrieval. InProceed- ings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 3734–3739

2021
[50]

Peiyang Liu, Xi Wang, Ziqiang Cui, and Wei Ye. 2025. Queries Are Not Alone: Clustering Text Embeddings for Video Search. InProceedings of the 48th Inter- national ACM SIGIR Conference on Research and Development in Information Retrieval. 874–883

2025
[51]

Peiyang Liu, Xi Wang, Lin Wang, Wei Ye, Xiangyu Xi, and Shikun Zhang. 2021. Distilling knowledge from bert into simple fully connected neural networks for efficient vertical retrieval. InProceedings of the 30th ACM International Conference on Information & Knowledge Management. 3965–3975

2021
[52]

Peiyang Liu, Xiangyu Xi, Wei Ye, and Shikun Zhang. 2022. Label smoothing for text mining. InProceedings of the 29th international conference on computational linguistics. 2210–2219

2022
[53]

Peiyang Liu, Jinyu Yang, Lin Wang, Sen Wang, Yunlai Hao, and Huihui Bai. 2023. Retrieval-Based Unsupervised Noisy Label Detection on Text Data. InProceed- ings of the 32nd ACM International Conference on Information and Knowledge Management. 4099–4104

2023
[54]

Peiyang Liu, Wei Ye, Xiangyu Xi, Tong Wang, Jinglei Zhang, and Shikun Zhang
[55]

In2020 International Joint Conference on Neural Networks (IJCNN)

Not all synonyms are created equal: Incorporating similarity of synonyms to enhance word embeddings. In2020 International Joint Conference on Neural Networks (IJCNN). IEEE, 1–8
[56]

Wei Liu, Zhiying Deng, Zhongyu Niu, Jun Wang, Haozhao Wang, and Ruixuan Li. 2025. Exploring practical gaps in using cross entropy to implement maximum mutual information criterion for rationalization.Transactions of the Association for Computational Linguistics13 (2025), 577–594

2025
[57]

Lingyu Mu, Hao Deng, Haibo Xing, Jinxin Hu, Yu Zhang, Xiaoyi Zeng, and Jing Zhang. 2026. Masked Diffusion Generative Recommendation.arXiv preprint arXiv:2601.19501(2026)

work page arXiv 2026
[58]

Karen Ka Yan Ng, Izuki Matsuba, and Peter Chengming Zhang. 2025. RAG in health care: a novel framework for improving communication and decision- making by addressing LLM limitations.Nejm Ai2, 1 (2025), AIra2400380

2025
[59]

Raymond S Nickerson. 1998. Confirmation bias: A ubiquitous phenomenon in many guises.Review of general psychology2, 2 (1998), 175–220

1998
[60]

Agada Joseph Oche, Ademola Glory Folashade, Tirthankar Ghosal, and Arpan Biswas. 2025. A systematic review of key retrieval-augmented generation (rag) systems: Progress, gaps, and future directions.arXiv preprint arXiv:2507.18910 (2025)

work page arXiv 2025
[61]

Long Ouyang, Jeffrey Wu, Xu Jiang, Diogo Almeida, Carroll Wainwright, Pamela Mishkin, Chong Zhang, Sandhini Agarwal, Katarina Slama, Alex Ray, et al. 2022. Training language models to follow instructions with human feedback.Advances in neural information processing systems35 (2022), 27730–27744

2022
[62]

Henry Papadatos and Rachel Freedman. 2024. Linear probe penalties reduce llm sycophancy.arXiv preprint arXiv:2412.00967(2024)

work page arXiv 2024
[63]

Boci Peng, Yun Zhu, Yongchao Liu, Xiaohe Bo, Haizhou Shi, Chuntao Hong, Yan Zhang, and Siliang Tang. 2025. Graph retrieval-augmented generation: A survey. ACM Transactions on Information Systems44, 2 (2025), 1–52

2025
[64]

Priya Pitre, Naren Ramakrishnan, and Xuan Wang. 2025. CONSENSAGENT: To- wards efficient and effective consensus in multi-agent LLM interactions through sycophancy mitigation. InFindings of the Association for Computational Linguis- tics: ACL 2025. 22112–22133. SIGIR ’26, July 20–24, 2026, Melbourne, VIC, Australia Peiyang Liu et al

2025
[65]

Guozhi Qiu, Zhiwei Chen, Zixu Li, Qinlei Huang, Zhiheng Fu, Xuemeng Song, and Yupeng Hu. 2026. MELT: Improve Composed Image Retrieval via the Modification Frequentation-Rarity Balance Network. InICASSP 2026-2026 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 13007– 13011

2026
[66]

Mrinank Sharma, Meg Tong, Tomasz Korbak, David Duvenaud, Amanda Askell, Samuel R Bowman, Newton Cheng, Esin Durmus, Zac Hatfield-Dodds, Scott R Johnston, et al. 2023. Towards understanding sycophancy in language models. arXiv preprint arXiv:2310.13548(2023)

work page internal anchor Pith review arXiv 2023
[67]

Marco Siino, Mariana Falco, Daniele Croce, and Paolo Rosso. 2025. Exploring llms applications in law: A literature review on current legal nlp approaches. IEEE Access(2025)

2025
[68]

Weihang Su, Yichen Tang, Qingyao Ai, Junxi Yan, Changyue Wang, Hongning Wang, Ziyi Ye, Yujia Zhou, and Yiqun Liu. 2025. Parametric retrieval augmented generation. InProceedings of the 48th International ACM SIGIR Conference on Research and Development in Information Retrieval. 1240–1250

2025
[69]

Yuan Sun and Ting Wang. 2025. Be friendly, not friends: How llm sycophancy shapes user trust.arXiv preprint arXiv:2502.10844(2025)

work page arXiv 2025
[70]

Qwen Team. 2025. Qwen3 Technical Report. arXiv:2505.09388 [cs.CL] https: //arxiv.org/abs/2505.09388

work page internal anchor Pith review Pith/arXiv arXiv 2025
[71]

Ayush Thakur and Rashmi Vashisth. 2024. Loops on retrieval augmented genera- tion (lorag).arXiv preprint arXiv:2403.15450(2024)

work page arXiv 2024
[72]

Jingru Wang, Wen Ding, and Xiaotong Zhu. 2025. Financial analysis: Intelligent financial data analysis system based on llm-rag.arXiv preprint arXiv:2504.06279 (2025)

work page arXiv 2025
[73]

Xiaohua Wang, Zhenghua Wang, Xuan Gao, Feiran Zhang, Yixin Wu, Zhibo Xu, Tianyuan Shi, Zhengyuan Wang, Shizheng Li, Qi Qian, et al . 2024. Searching for best practices in retrieval-augmented generation. InProceedings of the 2024 Conference on Empirical Methods in Natural Language Processing. 17716–17736

2024
[74]

Jerry Wei, Da Huang, Yifeng Lu, Denny Zhou, and Quoc V Le. 2023. Simple synthetic data reduces sycophancy in large language models.arXiv preprint arXiv:2308.03958(2023)

work page arXiv 2023
[75]

Shangyu Wu, Ying Xiong, Yufei Cui, Haolun Wu, Can Chen, Ye Yuan, Lianming Huang, Xue Liu, Tei-Wei Kuo, Nan Guan, et al. 2024. Retrieval-augmented gener- ation for natural language processing: A survey.arXiv preprint arXiv:2407.13193 (2024)

work page arXiv 2024
[76]

Haibo Xing, Hao Deng, Yucheng Mao, Lingyu Mu, Jinxin Hu, Yi Xu, Hao Zhang, Jiahao Wang, Shizhun Wang, Yu Zhang, et al. 2025. Reg4rec: Reasoning-enhanced generative model for large-scale recommendation systems.arXiv preprint arXiv:2508.15308(2025)

work page arXiv 2025
[77]

Qianyun Yang, Zhiwei Chen, Yupeng Hu, Zixu Li, Zhiheng Fu, and Liqiang Nie. 2026. STABLE: Efficient Hybrid Nearest Neighbor Search via Magnitude- Uniformity and Cardinality-Robustness.IEEE Transactions on Knowledge and Data Engineering(2026)

2026
[78]

Yuhan Yang, Jie Zou, Guojia An, Jiwei Wei, Yang Yang, and Heng Tao Shen
[79]

InProceedings of the 32nd ACM SIGKDD Conference on Knowledge Discovery and Data Mining V

Unleashing the potential of neighbors: diffusion-based latent neighbor generation for session-based recommendation. InProceedings of the 32nd ACM SIGKDD Conference on Knowledge Discovery and Data Mining V. 1. 1787–1796
[80]

Ori Yoran, Tomer Wolfson, Ori Ram, and Jonathan Berant. 2023. Making retrieval-augmented language models robust to irrelevant context.arXiv preprint arXiv:2310.01558(2023)

work page arXiv 2023

Showing first 80 references.