arxiv: 2511.20233 · v3 · submitted 2025-11-25 · 💻 cs.CL

Recognition: no theorem link

REFLEX: Self-Refining Explainable Fact-Checking via Verdict-Anchored Style Control

Chuyi Kong , Gao Wei , Jing Ma , Hongzhan Lin , Yuxi Sun

Authors on Pith no claims yet

Pith reviewed 2026-05-17 05:34 UTC · model grok-4.3

classification 💻 cs.CL

keywords fact-checkingexplainable AIlarge language modelsself-refiningsteering vectorshallucination mitigationmisinformation

0 comments

The pith

REFLEX disentangles fact from style in LLM fact-checking explanations by building verdict-anchored steering vectors from self-disagreement signals.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces REFLEX, a self-refining paradigm for explainable fact-checking that explicitly controls the style of generated explanations by anchoring them to the verdict. It builds steering vectors from self-disagreement veracity signals between a backbone model and its fine-tuned variant to separate factual content from stylistic choices. This matters because existing LLM approaches often produce unfaithful or hallucinated rationales that can mislead users, especially when they depend on external knowledge sources that add latency and errors. A sympathetic reader would care if the approach delivers more reliable real-time fact-checking systems that need far fewer training samples while improving verdict accuracy.

Core claim

REFLEX is a self-refining paradigm that explicitly controls reasoning style anchored on verdict. It utilizes self-disagreement veracity signals between the backbone model and its fine-tuned variant to construct steering vectors that naturally disentangle fact from style. Experiments on real-world datasets show it reaches state-of-the-art performance under LLaMA-series models with only 465 self-refined samples, yields up to a 7.54 percent gain on in-the-wild data thanks to transferability, and mitigates faithful hallucination to produce more accurate verdicts than prior explainable fact-checking methods.

What carries the argument

Verdict-anchored style control via steering vectors constructed from self-disagreement veracity signals between a backbone LLM and its fine-tuned variant.

If this is right

REFLEX reaches state-of-the-art performance on LLaMA-series models using only 465 self-refined samples.
The approach delivers up to a 7.54 percent performance gain on in-the-wild data through its transferability.
REFLEX reduces faithful hallucination in explanations and supports more accurate verdicts than earlier explainable fact-checking systems.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The steering-vector technique could apply to other LLM tasks that require separating content accuracy from output style without large external datasets.
Lower sample requirements might allow fact-checking tools to adapt quickly to emerging misinformation topics with minimal retraining.
Testing the method across additional social-media platforms could show whether the fact-style separation holds for varied misinformation formats.

Load-bearing premise

That self-disagreement veracity signals between the backbone model and its fine-tuned variant can naturally disentangle fact from style without introducing new biases or requiring external validation.

What would settle it

A controlled test on a held-out fact-checking dataset in which REFLEX-generated explanations still exhibit style-induced misleading content or produce lower verdict accuracy than standard fine-tuning baselines.

Figures

Figures reproduced from arXiv: 2511.20233 by Chuyi Kong, Gao Wei, Hongzhan Lin, Jing Ma, Yuxi Sun.

**Figure 1.** Figure 1: The brief outline of our three-stage REFLEX paradigm. The red text denotes reasoning style learned from fine-tuning, [PITH_FULL_IMAGE:figures/full_fig_p002_1.png] view at source ↗

**Figure 2.** Figure 2: Optimal Layer for improving pairs across different [PITH_FULL_IMAGE:figures/full_fig_p008_2.png] view at source ↗

**Figure 4.** Figure 4: The Redundancy Noise Pattern in LLaMA2 on RAW-FC, layer 10 with IV, multiplier 1.5. Red tokens denote alignment [PITH_FULL_IMAGE:figures/full_fig_p013_4.png] view at source ↗

read the original abstract

The prevalence of fake news on social media demands automated fact-checking systems to provide accurate verdicts with faithful explanations. However, existing large language model (LLM)-based approaches ignore deceptive misinformation styles in LLM-generated explanations, resulting in unfaithful rationales that can mislead human judgments. They rely heavily on external knowledge sources, introducing hallucinations and even high latency that undermine reliability and responsiveness, which is crucial for real-time use. To address these challenges, we propose REason-guided Fact-checking with Latent EXplanations (REFLEX), a self-refining paradigm that explicitly controls reasoning style anchored on verdict. REFLEX utilizes self-disagreement veracity signals between the backbone model and its fine-tuned variant to construct steering vectors, naturally disentangling fact from style. Experiments on the real-world dataset show REFLEX achieves state-of-the-art performance under LLaMA-series models with only 465 self-refined samples. Moreover, owing to its transferability, REFLEX yields up to a 7.54% gain on in-the-wild data. Our results further demonstrate that our method effectively mitigates faithful hallucination, thereby guiding the model toward more accurate verdicts than previous works in explainable fact-checking.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

REFLEX uses disagreement between a base LLaMA and its fine-tuned version on 465 samples to build steering vectors that aim to control explanation style while holding the verdict fixed.

read the letter

REFLEX tries to fix a real issue in LLM fact-checking: explanations that sound plausible but hide deceptive phrasing or introduce hallucinations. The method constructs steering vectors from where the backbone model and a fine-tuned variant disagree on veracity signals, then applies those vectors to anchor style control to the verdict. This is presented as a self-refining loop that needs only a small set of samples and avoids heavy external retrieval at inference time.

Referee Report

2 major / 2 minor

Summary. The paper proposes REFLEX, a self-refining paradigm for explainable fact-checking. It constructs steering vectors from self-disagreement veracity signals between a backbone LLaMA model and its fine-tuned variant on 465 samples to enable verdict-anchored style control. This is intended to disentangle fact from deceptive style in explanations, reducing faithful hallucinations without external knowledge. The authors claim SOTA performance under LLaMA-series models on real-world data, up to 7.54% gains on in-the-wild data, and improved mitigation of faithful hallucination.

Significance. If the results and the disentanglement hold under rigorous controls, REFLEX could offer a practical advance in LLM-based fact-checking by achieving strong performance and style control with minimal self-refined data and without external retrieval. The reported transferability to in-the-wild settings and the focus on mitigating unfaithful rationales are potentially valuable for real-time applications. The small sample count (465) would be a notable efficiency strength if the evaluation demonstrates clear separation of style from factuality.

major comments (2)

[Abstract] Abstract: The central performance claims (SOTA results, 7.54% in-the-wild gain, and mitigation of faithful hallucination) are presented without any mention of baselines, experimental controls, error bars, statistical significance, or criteria for selecting the 465 samples. This absence is load-bearing because the soundness of the reported gains cannot be assessed from the provided information.
[Method] Method description: The steering vectors are derived from disagreement between the backbone and its fine-tuned variant on the same 465 samples. Without a controlled measurement or ablation demonstrating that these vectors modulate explanation style independently of factuality (rather than capturing calibration artifacts or uncertainty), the claim that self-disagreement naturally disentangles fact from style remains unverified and risks circular reinforcement of the fine-tuned model's outputs.

minor comments (2)

Clarify whether the 465 self-refined samples are drawn from the evaluation distribution or held out, and provide the exact fine-tuning procedure for the variant model to allow reproducibility.
[Introduction] The abstract uses the term 'faithful hallucination' without a precise definition or reference to prior usage; a brief operational definition in the introduction would improve clarity.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their insightful comments on our manuscript. We address each of the major comments below and outline the revisions we will make to improve the paper.

read point-by-point responses

Referee: [Abstract] Abstract: The central performance claims (SOTA results, 7.54% in-the-wild gain, and mitigation of faithful hallucination) are presented without any mention of baselines, experimental controls, error bars, statistical significance, or criteria for selecting the 465 samples. This absence is load-bearing because the soundness of the reported gains cannot be assessed from the provided information.

Authors: We concur that the abstract lacks sufficient detail to fully contextualize our claims. To address this, we will revise the abstract to include references to the baselines (such as standard fine-tuned LLMs and prior explainable fact-checking approaches), experimental controls including multiple evaluation runs, error bars representing standard deviations, statistical significance via appropriate tests, and the selection criteria for the 465 samples as a randomly sampled balanced subset from the available training data. These additions will make the performance claims more transparent and assessable. revision: yes
Referee: [Method] Method description: The steering vectors are derived from disagreement between the backbone and its fine-tuned variant on the same 465 samples. Without a controlled measurement or ablation demonstrating that these vectors modulate explanation style independently of factuality (rather than capturing calibration artifacts or uncertainty), the claim that self-disagreement naturally disentangles fact from style remains unverified and risks circular reinforcement of the fine-tuned model's outputs.

Authors: This is a valid concern about the verification of the disentanglement mechanism. Our current experiments demonstrate that applying the steering vectors improves both verdict accuracy and explanation quality over the fine-tuned model, suggesting the signals capture useful style information. However, to more rigorously demonstrate independence from factuality and rule out calibration artifacts, we will add controlled ablations in the revised manuscript. These will include comparisons with steering vectors from non-veracity disagreements and quantitative measures of style (e.g., via perplexity on style-specific prompts) versus factuality metrics. We believe this will substantiate the claim without circularity. revision: yes

Circularity Check

0 steps flagged

No significant circularity; method claims rest on independent experimental validation

full rationale

The paper proposes REFLEX as a self-refining paradigm that constructs steering vectors from self-disagreement signals between a backbone model and its fine-tuned variant on 465 samples, claiming this naturally disentangles fact from style. Performance is reported via SOTA results on real-world datasets and up to 7.54% gains on separate in-the-wild data, with explicit mitigation of faithful hallucination. No quoted derivation step reduces by construction to its inputs, no fitted parameter is relabeled as a prediction, and no load-bearing premise relies on a self-citation chain. The approach is self-contained against external benchmarks and does not exhibit any of the enumerated circular patterns.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

The method rests on the assumption that model disagreement provides clean veracity signals separable from style; no explicit free parameters listed but the fine-tuning process and sample count of 465 function as implicit choices.

free parameters (1)

number of self-refined samples
Explicitly stated as 465; chosen to achieve reported performance.

axioms (1)

domain assumption Self-disagreement between backbone and fine-tuned model yields disentangled fact-style signals
Invoked in the description of constructing steering vectors.

pith-pipeline@v0.9.0 · 5522 in / 1235 out tokens · 41742 ms · 2026-05-17T05:34:14.766185+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

69 extracted references · 69 canonical work pages · 9 internal anchors

[1]

Massih-Reza Amini, Vasilii Feofanov, Loic Pauletto, Lies Hadjadj, Emilie Devijver, and Yury Maximov. 2025. Self-training: A survey.Neurocomputing616 (2025), 128904

work page 2025
[2]

Pepa Atanasova. 2024. Generating fact checking explanations. InAccountable and Explainable Methods for Complex Reasoning over Text. Springer, 83–103

work page 2024
[3]

Lukas Berglund, Asa Cooper Stickland, Mikita Balesni, Max Kaufmann, Meg Tong, Tomasz Korbak, Daniel Kokotajlo, and Owain Evans. 2023. Taken out of context: On measuring situational awareness in llms.arXiv preprint arXiv:2309.00667 (2023)

work page arXiv 2023
[4]

Collin Burns, Haotian Ye, Dan Klein, and Jacob Steinhardt. [n. d.]. Discovering Latent Knowledge in Language Models Without Supervision. InThe Eleventh International Conference on Learning Representations

work page
[5]

Qiguang Chen, Libo Qin, Jinhao Liu, Dengyun Peng, Jiannan Guan, Peng Wang, Mengkang Hu, Yuhang Zhou, Te Gao, and Wanxiang Che. 2025. Towards reason- ing era: A survey of long chain-of-thought for reasoning large language models. arXiv preprint arXiv:2503.09567(2025)

work page internal anchor Pith review Pith/arXiv arXiv 2025
[6]

Tsun-Hin Cheung and Kin-Man Lam. 2023. Factllama: Optimizing instruction- following language models with external knowledge for automated fact-checking. In2023 Asia Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC). IEEE, 846–853

work page 2023
[7]

Yung-Sung Chuang, Yujia Xie, Hongyin Luo, Yoon Kim, James R Glass, and Pengcheng He. 2023. DoLa: Decoding by Contrasting Layers Improves Factuality in Large Language Models. InThe Twelfth International Conference on Learning Representations

work page 2023
[8]

Sumanth Dathathri, Andrea Madotto, Janice Lan, Jane Hung, Eric Frank, Piero Molino, Jason Yosinski, and Rosanne Liu. [n. d.]. Plug and Play Language Models: A Simple Approach to Controlled Text Generation. InInternational Conference on Learning Representations

work page
[9]

Kanishk Gandhi, Ayush Chakravarthy, Anikait Singh, Nathan Lile, and Noah D Goodman. 2025. Cognitive behaviors that enable self-improving reasoners, or, four habits of highly effective stars.arXiv preprint arXiv:2503.01307(2025)

work page internal anchor Pith review arXiv 2025
[10]

Zorik Gekhman, Gal Yona, Roee Aharoni, Matan Eyal, Amir Feder, Roi Reichart, and Jonathan Herzig. 2024. Does fine-tuning llms on new knowledge encourage hallucinations?arXiv preprint arXiv:2405.05904(2024)

work page arXiv 2024
[11]

Gaurav Rohit Ghosal, Tatsunori Hashimoto, and Aditi Raghunathan. 2024. Under- standing Finetuning for Factual Knowledge Extraction. InInternational Conference on Machine Learning. PMLR, 15540–15558

work page 2024
[12]

Jiawei Gu, Xuhui Jiang, Zhichao Shi, Hexiang Tan, Xuehao Zhai, Chengjin Xu, Wei Li, Yinghan Shen, Shengjie Ma, Honghao Liu, Saizhuo Wang, Kun Zhang, Yuanzhuo Wang, Wen Gao, Lionel Ni, and Jian Guo. 2025. A Survey on LLM-as- a-Judge. arXiv:2411.15594 [cs.CL] https://arxiv.org/abs/2411.15594

work page internal anchor Pith review Pith/arXiv arXiv 2025
[13]

Chi Han, Jialiang Xu, Manling Li, Yi Fung, Chenkai Sun, Nan Jiang, Tarek Ab- delzaher, and Heng Ji. 2023. Word embeddings are steers for language models. arXiv preprint arXiv:2305.12798(2023)

work page arXiv 2023
[14]

Evan Hernandez, Belinda Z Li, and Jacob Andreas. 2023. Inspecting and editing knowledge representations in language models.arXiv preprint arXiv:2304.00740 (2023)

work page arXiv 2023
[15]

Edward J Hu, Yelong Shen, Phillip Wallis, Zeyuan Allen-Zhu, Yuanzhi Li, Shean Wang, Lu Wang, and Weizhu Chen. 2021. Lora: Low-rank adaptation of large language models.arXiv preprint arXiv:2106.09685(2021)

work page internal anchor Pith review Pith/arXiv arXiv 2021
[16]

Lei Huang, Weijiang Yu, Weitao Ma, Weihong Zhong, Zhangyin Feng, Haotian Wang, Qianglong Chen, Weihua Peng, Xiaocheng Feng, Bing Qin, et al. 2025. A survey on hallucination in large language models: Principles, taxonomy, chal- lenges, and open questions.ACM Transactions on Information Systems43, 2 (2025), 1–55

work page 2025
[17]

Shailza Jolly, Pepa Atanasova, and Isabelle Augenstein. 2022. Generating fluent fact checking explanations with unsupervised post-editing.Information13, 10 (2022), 500

work page 2022
[18]

Aobo Kong, Shiwan Zhao, Hao Chen, Qicheng Li, Yong Qin, Ruiqi Sun, Xin Zhou, Enzhi Wang, and Xiaohang Dong. 2023. Better zero-shot reasoning with role-play prompting.arXiv preprint arXiv:2308.07702(2023)

work page arXiv 2023
[19]

Aobo Kong, Shiwan Zhao, Hao Chen, Qicheng Li, Yong Qin, Ruiqi Sun, Xin Zhou, Jiaming Zhou, and Haoqin Sun. 2024. Self-prompt tuning: Enable autonomous role-playing in llms.arXiv preprint arXiv:2407.08995(2024)

work page arXiv 2024
[20]

Neema Kotonya and Francesca Toni. 2020. Explainable automated fact-checking for public health claims.arXiv preprint arXiv:2010.09926(2020)

work page arXiv 2020
[21]

Ben Krause, Akhilesh Deepak Gotmare, Bryan McCann, Nitish Shirish Keskar, Shafiq Joty, Richard Socher, and Nazneen Fatema Rajani. 2021. GeDi: Generative Discriminator Guided Sequence Generation. InFindings of the Association for Computational Linguistics: EMNLP 2021. 4929–4952

work page 2021
[22]

Patrick Lewis, Ethan Perez, Aleksandra Piktus, Fabio Petroni, Vladimir Karpukhin, Naman Goyal, Heinrich Küttler, Mike Lewis, Wen-tau Yih, Tim Rocktäschel, et al. 2020. Retrieval-augmented generation for knowledge-intensive nlp tasks. Advances in Neural Information Processing Systems33 (2020), 9459–9474

work page 2020
[23]

Kenneth Li, Aspen K Hopkins, David Bau, Fernanda Viégas, Hanspeter Pfister, and Martin Wattenberg. [n. d.]. Emergent World Representations: Exploring a Sequence Model Trained on a Synthetic Task. InThe Eleventh International Conference on Learning Representations

work page
[24]

Kenneth Li, Oam Patel, Fernanda Viégas, Hanspeter Pfister, and Martin Wat- tenberg. 2023. Inference-time intervention: Eliciting truthful answers from a language model.Advances in Neural Information Processing Systems36 (2023), 41451–41530

work page 2023
[25]

Xiang Li, John Thickstun, Ishaan Gulrajani, Percy S Liang, and Tatsunori B Hashimoto. 2022. Diffusion-lm improves controllable text generation.Advances in neural information processing systems35 (2022), 4328–4343

work page 2022
[26]

Stephanie Lin, Jacob Hilton, and Owain Evans. 2022. Truthfulqa: Measuring how models mimic human falsehoods. InProceedings of the 60th annual meeting of the association for computational linguistics (volume 1: long papers). 3214–3252

work page 2022
[27]

Philip Lippmann and Jie Yang. 2025. Style over Substance: Distilled Language Models Reason Via Stylistic Replication.arXiv preprint arXiv:2504.01738(2025)

work page arXiv 2025
[28]

Jiacheng Liu, Alisa Liu, Ximing Lu, Sean Welleck, Peter West, Ronan Le Bras, Yejin Choi, and Hannaneh Hajishirzi. 2022. Generated Knowledge Prompting for Commonsense Reasoning. InProceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 3154–3169

work page 2022
[29]

Jiachang Liu, Dinghan Shen, Yizhe Zhang, William B Dolan, Lawrence Carin, and Weizhu Chen. 2022. What Makes Good In-Context Examples for GPT-3?. InProceedings of Deep Learning Inside Out (DeeLIO 2022): The 3rd Workshop on Knowledge Extraction and Integration for Deep Learning Architectures. 100–114

work page 2022
[30]

Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Mandar Joshi, Danqi Chen, Omer Levy, Mike Lewis, Luke Zettlemoyer, and Veselin Stoyanov. 2019. Roberta: A robustly optimized bert pretraining approach.arXiv preprint arXiv:1907.11692 (2019)

work page internal anchor Pith review Pith/arXiv arXiv 2019
[31]

Yao Lu, Max Bartolo, Alastair Moore, Sebastian Riedel, and Pontus Stenetorp

work page
[32]

InProceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Fantastically Ordered Prompts and Where to Find Them: Overcoming Few-Shot Prompt Order Sensitivity. InProceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 8086–8098

work page
[33]

Yi-Ju Lu and Cheng-Te Li. 2020. GCAN: Graph-aware co-attention networks for explainable fake news detection on social media.arXiv preprint arXiv:2004.11648 (2020)

work page arXiv 2020
[34]

Jing Ma, Wei Gao, Shafiq Joty, and Kam-Fai Wong. 2019. Sentence-level evi- dence embedding for claim verification with hierarchical attention networks. Association for Computational Linguistics

work page 2019
[35]

Melkamu Mersha, Khang Lam, Joseph Wood, Ali K Alshami, and Jugal Kalita. 2024. Explainable artificial intelligence: A survey of needs, techniques, applications, and future direction.Neurocomputing599 (2024), 128111

work page 2024
[36]

Sewon Min, Mike Lewis, Hannaneh Hajishirzi, and Luke Zettlemoyer. 2022. Noisy Channel Language Model Prompting for Few-Shot Text Classification. InProceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 5316–5330

work page 2022
[37]

Tai Nguyen and Eric Wong. 2023. In-context Example Selection with Influences. arXiv e-prints(2023), arXiv–2302

work page 2023
[38]

Yixin Nie, Haonan Chen, and Mohit Bansal. 2019. Combining fact extraction and verification with neural semantic matching networks. InProceedings of the AAAI conference on artificial intelligence, Vol. 33. 6859–6866

work page 2019
[39]

2023.Introducing ChatGPT

OpenAI. 2023.Introducing ChatGPT. https://openai.com/blog/chatgpt

work page 2023
[40]

Seongheon Park, Xuefeng Du, Min-Hsuan Yeh, Haobo Wang, and Yixuan Li. 2025. Steer LLM Latents for Hallucination Detection.arXiv preprint arXiv:2503.01917 (2025). Conference’17, July 2017, Washington, DC, USA Chuyi Kong, Gao Wei, Jing Ma ∗, Hongzhan Lin, and Yaxin Fan

work page arXiv 2025
[41]

Verónica Pérez-Rosas, Bennett Kleinberg, Alexandra Lefevre, and Rada Mihalcea

work page
[42]

Automatic detection of fake news.arXiv preprint arXiv:1708.07104(2017)

work page internal anchor Pith review Pith/arXiv arXiv 2017
[43]

Kashyap Popat, Subhabrata Mukherjee, Andrew Yates, and Gerhard Weikum

work page
[44]

Declare: Debunking fake news and false claims using evidence-aware deep learning.arXiv preprint arXiv:1809.06416(2018)

work page internal anchor Pith review Pith/arXiv arXiv 2018
[45]

Hannah Rashkin, Eunsol Choi, Jin Yea Jang, Svitlana Volkova, and Yejin Choi

work page
[46]

InProceedings of the 2017 conference on empirical methods in natural language processing

Truth of varying shades: Analyzing language in fake news and political fact-checking. InProceedings of the 2017 conference on empirical methods in natural language processing. 2931–2937

work page 2017
[47]

Ratcliff and David E

John W. Ratcliff and David E. Metzener. 1988. Pattern Matching: The Gestalt Approach.Dr. Dobb’s Journal13, 7 (Jul 1988), 46

work page 1988
[48]

Xuan Ren, Biao Wu, and Lingqiao Liu. 2024. I learn better if you speak my lan- guage: Enhancing large language model fine-tuning with style-aligned response adjustments.CoRR(2024)

work page 2024
[49]

Nina Rimsky, Nick Gabrieli, Julian Schulz, Meg Tong, Evan Hubinger, and Alexan- der Turner. 2024. Steering Llama 2 via Contrastive Activation Addition. InProceed- ings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 15504–15522

work page 2024
[50]

Daniel Russo, Serra Sinem Tekiroğlu, and Marco Guerini. 2023. Benchmarking the generation of fact checking explanations.Transactions of the Association for Computational Linguistics11 (2023), 1250–1264

work page 2023
[51]

Michael Schlichtkrull, Zhijiang Guo, and Andreas Vlachos. 2023. Averitec: A dataset for real-world claim verification with evidence from the web.Advances in Neural Information Processing Systems36 (2023), 65128–65167

work page 2023
[52]

Tal Schuster, Roei Schuster, Darsh J Shah, and Regina Barzilay. 2020. The limita- tions of stylometry for detecting machine-generated fake news.Computational Linguistics46, 2 (2020), 499–510

work page 2020
[53]

Why is this misleading?

Jiaming Shen, Jialu Liu, Dan Finnie, Negar Rahmati, Mike Bendersky, and Marc Najork. 2023. “Why is this misleading?”: Detecting News Headline Hallucinations with Explanations. InProceedings of the ACM Web Conference 2023. 1662–1672

work page 2023
[54]

Kai Shu, Limeng Cui, Suhang Wang, Dongwon Lee, and Huan Liu. 2019. de- fend: Explainable fake news detection. InProceedings of the 25th ACM SIGKDD international conference on knowledge discovery & data mining. 395–405

work page 2019
[55]

Satyam Shukla, Himanshu Dutta, and Pushpak Bhattacharyya. 2025. Recon, Answer, Verify: Agents in Search of Truth.arXiv preprint arXiv:2507.03671(2025)

work page arXiv 2025
[57]

Hugo Touvron, Louis Martin, Kevin Stone, Peter Albert, Amjad Almahairi, Yas- mine Babaei, Nikolay Bashlykov, Soumya Batra, Prajjwal Bhargava, Shruti Bhos- ale, et al. 2023. Llama 2: Open foundation and fine-tuned chat models.arXiv preprint arXiv:2307.09288(2023)

work page internal anchor Pith review Pith/arXiv arXiv 2023
[58]

Bo Wang, Jing Ma, Hongzhan Lin, Zhiwei Yang, Ruichao Yang, Yuan Tian, and Yi Chang. 2024. Explainable fake news detection with large language model via defense among competing wisdom. InProceedings of the ACM Web Conference

work page 2024
[59]

Jason Wei, Xuezhi Wang, Dale Schuurmans, Maarten Bosma, Fei Xia, Ed Chi, Quoc V Le, Denny Zhou, et al. 2022. Chain-of-thought prompting elicits reasoning in large language models.Advances in neural information processing systems35 (2022), 24824–24837

work page 2022
[60]

Jiaying Wu, Jiafeng Guo, and Bryan Hooi. 2024. Fake news in sheep’s clothing: Robust fake news detection against LLM-empowered style attacks. InProceedings of the 30th ACM SIGKDD conference on knowledge discovery and data mining. 3367–3378

work page 2024
[61]

Lianwei Wu, Yuan Rao, Ling Sun, and Wangbo He. 2021. Evidence inference net- works for interpretable claim verification. InProceedings of the AAAI conference on artificial intelligence, Vol. 35. 14058–14066

work page 2021
[62]

An Yang, Anfeng Li, Baosong Yang, Beichen Zhang, Binyuan Hui, Bo Zheng, Bowen Yu, Chang Gao, Chengen Huang, Chenxu Lv, et al. 2025. Qwen3 technical report.arXiv preprint arXiv:2505.09388(2025)

work page internal anchor Pith review Pith/arXiv arXiv 2025
[63]

Zhiwei Yang, Jing Ma, Hechang Chen, Hongzhan Lin, Ziyang Luo, and Yi Chang

work page
[64]

A coarse-to-fine cascaded evidence-distillation neural network for explain- able fake news detection.arXiv preprint arXiv:2209.14642(2022)

work page arXiv 2022
[65]

Barry Menglong Yao, Aditya Shah, Lichao Sun, Jin-Hee Cho, and Lifu Huang. 2023. End-to-end multimodal fact-checking and explanation generation: A challenging dataset and models. InProceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval. 2733–2743

work page 2023
[66]

Zeyu Yun, Yubei Chen, Bruno A Olshausen, and Yann LeCun. 2021. Transformer visualization via dictionary learning: contextualized embedding as a linear su- perposition of transformer factors.arXiv preprint arXiv:2103.15949(2021)

work page arXiv 2021
[67]

Eric Zelikman, Yuhuai Wu, Jesse Mu, and Noah D Goodman. 2024. Star: Self- taught reasoner bootstrapping reasoning with reasoning. InProc. the 36th Inter- national Conference on Neural Information Processing Systems, Vol. 1126

work page 2024
[68]

Tianjun Zhang, Xuezhi Wang, Denny Zhou, Dale Schuurmans, and Joseph E Gon- zalez. [n. d.]. TEMPERA: Test-Time Prompt Editing via Reinforcement Learning. InThe Eleventh International Conference on Learning Representations

work page
[69]

Xuan Zhang and Wei Gao. 2023. Towards llm-based fact verification on news claims with a hierarchical step-by-step prompting method.arXiv preprint arXiv:2310.00305(2023)

work page arXiv 2023
[70]

Eric Zhao, Pranjal Awasthi, and Nika Haghtalab. 2025. From Style to Facts: Mapping the Boundaries of Knowledge Injection with Finetuning.arXiv preprint arXiv:2503.05919(2025). A Prompt Template Following [5, 28], the prompt template we use to conduct training and inference for claims is as follows: A chat between a curious human and an artificial intellig...

work page arXiv 2025