pith. machine review for the scientific record. sign in

arxiv: 2511.20233 · v3 · submitted 2025-11-25 · 💻 cs.CL

Recognition: no theorem link

REFLEX: Self-Refining Explainable Fact-Checking via Verdict-Anchored Style Control

Authors on Pith no claims yet

Pith reviewed 2026-05-17 05:34 UTC · model grok-4.3

classification 💻 cs.CL
keywords fact-checkingexplainable AIlarge language modelsself-refiningsteering vectorshallucination mitigationmisinformation
0
0 comments X

The pith

REFLEX disentangles fact from style in LLM fact-checking explanations by building verdict-anchored steering vectors from self-disagreement signals.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces REFLEX, a self-refining paradigm for explainable fact-checking that explicitly controls the style of generated explanations by anchoring them to the verdict. It builds steering vectors from self-disagreement veracity signals between a backbone model and its fine-tuned variant to separate factual content from stylistic choices. This matters because existing LLM approaches often produce unfaithful or hallucinated rationales that can mislead users, especially when they depend on external knowledge sources that add latency and errors. A sympathetic reader would care if the approach delivers more reliable real-time fact-checking systems that need far fewer training samples while improving verdict accuracy.

Core claim

REFLEX is a self-refining paradigm that explicitly controls reasoning style anchored on verdict. It utilizes self-disagreement veracity signals between the backbone model and its fine-tuned variant to construct steering vectors that naturally disentangle fact from style. Experiments on real-world datasets show it reaches state-of-the-art performance under LLaMA-series models with only 465 self-refined samples, yields up to a 7.54 percent gain on in-the-wild data thanks to transferability, and mitigates faithful hallucination to produce more accurate verdicts than prior explainable fact-checking methods.

What carries the argument

Verdict-anchored style control via steering vectors constructed from self-disagreement veracity signals between a backbone LLM and its fine-tuned variant.

If this is right

  • REFLEX reaches state-of-the-art performance on LLaMA-series models using only 465 self-refined samples.
  • The approach delivers up to a 7.54 percent performance gain on in-the-wild data through its transferability.
  • REFLEX reduces faithful hallucination in explanations and supports more accurate verdicts than earlier explainable fact-checking systems.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The steering-vector technique could apply to other LLM tasks that require separating content accuracy from output style without large external datasets.
  • Lower sample requirements might allow fact-checking tools to adapt quickly to emerging misinformation topics with minimal retraining.
  • Testing the method across additional social-media platforms could show whether the fact-style separation holds for varied misinformation formats.

Load-bearing premise

That self-disagreement veracity signals between the backbone model and its fine-tuned variant can naturally disentangle fact from style without introducing new biases or requiring external validation.

What would settle it

A controlled test on a held-out fact-checking dataset in which REFLEX-generated explanations still exhibit style-induced misleading content or produce lower verdict accuracy than standard fine-tuning baselines.

Figures

Figures reproduced from arXiv: 2511.20233 by Chuyi Kong, Gao Wei, Hongzhan Lin, Jing Ma, Yuxi Sun.

Figure 1
Figure 1. Figure 1: The brief outline of our three-stage REFLEX paradigm. The red text denotes reasoning style learned from fine-tuning, [PITH_FULL_IMAGE:figures/full_fig_p002_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Optimal Layer for improving pairs across different [PITH_FULL_IMAGE:figures/full_fig_p008_2.png] view at source ↗
Figure 4
Figure 4. Figure 4: The Redundancy Noise Pattern in LLaMA2 on RAW-FC, layer 10 with IV, multiplier 1.5. Red tokens denote alignment [PITH_FULL_IMAGE:figures/full_fig_p013_4.png] view at source ↗
read the original abstract

The prevalence of fake news on social media demands automated fact-checking systems to provide accurate verdicts with faithful explanations. However, existing large language model (LLM)-based approaches ignore deceptive misinformation styles in LLM-generated explanations, resulting in unfaithful rationales that can mislead human judgments. They rely heavily on external knowledge sources, introducing hallucinations and even high latency that undermine reliability and responsiveness, which is crucial for real-time use. To address these challenges, we propose REason-guided Fact-checking with Latent EXplanations (REFLEX), a self-refining paradigm that explicitly controls reasoning style anchored on verdict. REFLEX utilizes self-disagreement veracity signals between the backbone model and its fine-tuned variant to construct steering vectors, naturally disentangling fact from style. Experiments on the real-world dataset show REFLEX achieves state-of-the-art performance under LLaMA-series models with only 465 self-refined samples. Moreover, owing to its transferability, REFLEX yields up to a 7.54% gain on in-the-wild data. Our results further demonstrate that our method effectively mitigates faithful hallucination, thereby guiding the model toward more accurate verdicts than previous works in explainable fact-checking.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper proposes REFLEX, a self-refining paradigm for explainable fact-checking. It constructs steering vectors from self-disagreement veracity signals between a backbone LLaMA model and its fine-tuned variant on 465 samples to enable verdict-anchored style control. This is intended to disentangle fact from deceptive style in explanations, reducing faithful hallucinations without external knowledge. The authors claim SOTA performance under LLaMA-series models on real-world data, up to 7.54% gains on in-the-wild data, and improved mitigation of faithful hallucination.

Significance. If the results and the disentanglement hold under rigorous controls, REFLEX could offer a practical advance in LLM-based fact-checking by achieving strong performance and style control with minimal self-refined data and without external retrieval. The reported transferability to in-the-wild settings and the focus on mitigating unfaithful rationales are potentially valuable for real-time applications. The small sample count (465) would be a notable efficiency strength if the evaluation demonstrates clear separation of style from factuality.

major comments (2)
  1. [Abstract] Abstract: The central performance claims (SOTA results, 7.54% in-the-wild gain, and mitigation of faithful hallucination) are presented without any mention of baselines, experimental controls, error bars, statistical significance, or criteria for selecting the 465 samples. This absence is load-bearing because the soundness of the reported gains cannot be assessed from the provided information.
  2. [Method] Method description: The steering vectors are derived from disagreement between the backbone and its fine-tuned variant on the same 465 samples. Without a controlled measurement or ablation demonstrating that these vectors modulate explanation style independently of factuality (rather than capturing calibration artifacts or uncertainty), the claim that self-disagreement naturally disentangles fact from style remains unverified and risks circular reinforcement of the fine-tuned model's outputs.
minor comments (2)
  1. Clarify whether the 465 self-refined samples are drawn from the evaluation distribution or held out, and provide the exact fine-tuning procedure for the variant model to allow reproducibility.
  2. [Introduction] The abstract uses the term 'faithful hallucination' without a precise definition or reference to prior usage; a brief operational definition in the introduction would improve clarity.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their insightful comments on our manuscript. We address each of the major comments below and outline the revisions we will make to improve the paper.

read point-by-point responses
  1. Referee: [Abstract] Abstract: The central performance claims (SOTA results, 7.54% in-the-wild gain, and mitigation of faithful hallucination) are presented without any mention of baselines, experimental controls, error bars, statistical significance, or criteria for selecting the 465 samples. This absence is load-bearing because the soundness of the reported gains cannot be assessed from the provided information.

    Authors: We concur that the abstract lacks sufficient detail to fully contextualize our claims. To address this, we will revise the abstract to include references to the baselines (such as standard fine-tuned LLMs and prior explainable fact-checking approaches), experimental controls including multiple evaluation runs, error bars representing standard deviations, statistical significance via appropriate tests, and the selection criteria for the 465 samples as a randomly sampled balanced subset from the available training data. These additions will make the performance claims more transparent and assessable. revision: yes

  2. Referee: [Method] Method description: The steering vectors are derived from disagreement between the backbone and its fine-tuned variant on the same 465 samples. Without a controlled measurement or ablation demonstrating that these vectors modulate explanation style independently of factuality (rather than capturing calibration artifacts or uncertainty), the claim that self-disagreement naturally disentangles fact from style remains unverified and risks circular reinforcement of the fine-tuned model's outputs.

    Authors: This is a valid concern about the verification of the disentanglement mechanism. Our current experiments demonstrate that applying the steering vectors improves both verdict accuracy and explanation quality over the fine-tuned model, suggesting the signals capture useful style information. However, to more rigorously demonstrate independence from factuality and rule out calibration artifacts, we will add controlled ablations in the revised manuscript. These will include comparisons with steering vectors from non-veracity disagreements and quantitative measures of style (e.g., via perplexity on style-specific prompts) versus factuality metrics. We believe this will substantiate the claim without circularity. revision: yes

Circularity Check

0 steps flagged

No significant circularity; method claims rest on independent experimental validation

full rationale

The paper proposes REFLEX as a self-refining paradigm that constructs steering vectors from self-disagreement signals between a backbone model and its fine-tuned variant on 465 samples, claiming this naturally disentangles fact from style. Performance is reported via SOTA results on real-world datasets and up to 7.54% gains on separate in-the-wild data, with explicit mitigation of faithful hallucination. No quoted derivation step reduces by construction to its inputs, no fitted parameter is relabeled as a prediction, and no load-bearing premise relies on a self-citation chain. The approach is self-contained against external benchmarks and does not exhibit any of the enumerated circular patterns.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

The method rests on the assumption that model disagreement provides clean veracity signals separable from style; no explicit free parameters listed but the fine-tuning process and sample count of 465 function as implicit choices.

free parameters (1)
  • number of self-refined samples
    Explicitly stated as 465; chosen to achieve reported performance.
axioms (1)
  • domain assumption Self-disagreement between backbone and fine-tuned model yields disentangled fact-style signals
    Invoked in the description of constructing steering vectors.

pith-pipeline@v0.9.0 · 5522 in / 1235 out tokens · 41742 ms · 2026-05-17T05:34:14.766185+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

69 extracted references · 69 canonical work pages · 9 internal anchors

  1. [1]

    Massih-Reza Amini, Vasilii Feofanov, Loic Pauletto, Lies Hadjadj, Emilie Devijver, and Yury Maximov. 2025. Self-training: A survey.Neurocomputing616 (2025), 128904

  2. [2]

    Pepa Atanasova. 2024. Generating fact checking explanations. InAccountable and Explainable Methods for Complex Reasoning over Text. Springer, 83–103

  3. [3]

    Lukas Berglund, Asa Cooper Stickland, Mikita Balesni, Max Kaufmann, Meg Tong, Tomasz Korbak, Daniel Kokotajlo, and Owain Evans. 2023. Taken out of context: On measuring situational awareness in llms.arXiv preprint arXiv:2309.00667 (2023)

  4. [4]

    Collin Burns, Haotian Ye, Dan Klein, and Jacob Steinhardt. [n. d.]. Discovering Latent Knowledge in Language Models Without Supervision. InThe Eleventh International Conference on Learning Representations

  5. [5]

    Qiguang Chen, Libo Qin, Jinhao Liu, Dengyun Peng, Jiannan Guan, Peng Wang, Mengkang Hu, Yuhang Zhou, Te Gao, and Wanxiang Che. 2025. Towards reason- ing era: A survey of long chain-of-thought for reasoning large language models. arXiv preprint arXiv:2503.09567(2025)

  6. [6]

    Tsun-Hin Cheung and Kin-Man Lam. 2023. Factllama: Optimizing instruction- following language models with external knowledge for automated fact-checking. In2023 Asia Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC). IEEE, 846–853

  7. [7]

    Yung-Sung Chuang, Yujia Xie, Hongyin Luo, Yoon Kim, James R Glass, and Pengcheng He. 2023. DoLa: Decoding by Contrasting Layers Improves Factuality in Large Language Models. InThe Twelfth International Conference on Learning Representations

  8. [8]

    Sumanth Dathathri, Andrea Madotto, Janice Lan, Jane Hung, Eric Frank, Piero Molino, Jason Yosinski, and Rosanne Liu. [n. d.]. Plug and Play Language Models: A Simple Approach to Controlled Text Generation. InInternational Conference on Learning Representations

  9. [9]

    Kanishk Gandhi, Ayush Chakravarthy, Anikait Singh, Nathan Lile, and Noah D Goodman. 2025. Cognitive behaviors that enable self-improving reasoners, or, four habits of highly effective stars.arXiv preprint arXiv:2503.01307(2025)

  10. [10]

    Zorik Gekhman, Gal Yona, Roee Aharoni, Matan Eyal, Amir Feder, Roi Reichart, and Jonathan Herzig. 2024. Does fine-tuning llms on new knowledge encourage hallucinations?arXiv preprint arXiv:2405.05904(2024)

  11. [11]

    Gaurav Rohit Ghosal, Tatsunori Hashimoto, and Aditi Raghunathan. 2024. Under- standing Finetuning for Factual Knowledge Extraction. InInternational Conference on Machine Learning. PMLR, 15540–15558

  12. [12]

    Jiawei Gu, Xuhui Jiang, Zhichao Shi, Hexiang Tan, Xuehao Zhai, Chengjin Xu, Wei Li, Yinghan Shen, Shengjie Ma, Honghao Liu, Saizhuo Wang, Kun Zhang, Yuanzhuo Wang, Wen Gao, Lionel Ni, and Jian Guo. 2025. A Survey on LLM-as- a-Judge. arXiv:2411.15594 [cs.CL] https://arxiv.org/abs/2411.15594

  13. [13]

    Chi Han, Jialiang Xu, Manling Li, Yi Fung, Chenkai Sun, Nan Jiang, Tarek Ab- delzaher, and Heng Ji. 2023. Word embeddings are steers for language models. arXiv preprint arXiv:2305.12798(2023)

  14. [14]

    Evan Hernandez, Belinda Z Li, and Jacob Andreas. 2023. Inspecting and editing knowledge representations in language models.arXiv preprint arXiv:2304.00740 (2023)

  15. [15]

    Edward J Hu, Yelong Shen, Phillip Wallis, Zeyuan Allen-Zhu, Yuanzhi Li, Shean Wang, Lu Wang, and Weizhu Chen. 2021. Lora: Low-rank adaptation of large language models.arXiv preprint arXiv:2106.09685(2021)

  16. [16]

    Lei Huang, Weijiang Yu, Weitao Ma, Weihong Zhong, Zhangyin Feng, Haotian Wang, Qianglong Chen, Weihua Peng, Xiaocheng Feng, Bing Qin, et al. 2025. A survey on hallucination in large language models: Principles, taxonomy, chal- lenges, and open questions.ACM Transactions on Information Systems43, 2 (2025), 1–55

  17. [17]

    Shailza Jolly, Pepa Atanasova, and Isabelle Augenstein. 2022. Generating fluent fact checking explanations with unsupervised post-editing.Information13, 10 (2022), 500

  18. [18]

    Aobo Kong, Shiwan Zhao, Hao Chen, Qicheng Li, Yong Qin, Ruiqi Sun, Xin Zhou, Enzhi Wang, and Xiaohang Dong. 2023. Better zero-shot reasoning with role-play prompting.arXiv preprint arXiv:2308.07702(2023)

  19. [19]

    Aobo Kong, Shiwan Zhao, Hao Chen, Qicheng Li, Yong Qin, Ruiqi Sun, Xin Zhou, Jiaming Zhou, and Haoqin Sun. 2024. Self-prompt tuning: Enable autonomous role-playing in llms.arXiv preprint arXiv:2407.08995(2024)

  20. [20]

    Neema Kotonya and Francesca Toni. 2020. Explainable automated fact-checking for public health claims.arXiv preprint arXiv:2010.09926(2020)

  21. [21]

    Ben Krause, Akhilesh Deepak Gotmare, Bryan McCann, Nitish Shirish Keskar, Shafiq Joty, Richard Socher, and Nazneen Fatema Rajani. 2021. GeDi: Generative Discriminator Guided Sequence Generation. InFindings of the Association for Computational Linguistics: EMNLP 2021. 4929–4952

  22. [22]

    Patrick Lewis, Ethan Perez, Aleksandra Piktus, Fabio Petroni, Vladimir Karpukhin, Naman Goyal, Heinrich Küttler, Mike Lewis, Wen-tau Yih, Tim Rocktäschel, et al. 2020. Retrieval-augmented generation for knowledge-intensive nlp tasks. Advances in Neural Information Processing Systems33 (2020), 9459–9474

  23. [23]

    Kenneth Li, Aspen K Hopkins, David Bau, Fernanda Viégas, Hanspeter Pfister, and Martin Wattenberg. [n. d.]. Emergent World Representations: Exploring a Sequence Model Trained on a Synthetic Task. InThe Eleventh International Conference on Learning Representations

  24. [24]

    Kenneth Li, Oam Patel, Fernanda Viégas, Hanspeter Pfister, and Martin Wat- tenberg. 2023. Inference-time intervention: Eliciting truthful answers from a language model.Advances in Neural Information Processing Systems36 (2023), 41451–41530

  25. [25]

    Xiang Li, John Thickstun, Ishaan Gulrajani, Percy S Liang, and Tatsunori B Hashimoto. 2022. Diffusion-lm improves controllable text generation.Advances in neural information processing systems35 (2022), 4328–4343

  26. [26]

    Stephanie Lin, Jacob Hilton, and Owain Evans. 2022. Truthfulqa: Measuring how models mimic human falsehoods. InProceedings of the 60th annual meeting of the association for computational linguistics (volume 1: long papers). 3214–3252

  27. [27]

    Philip Lippmann and Jie Yang. 2025. Style over Substance: Distilled Language Models Reason Via Stylistic Replication.arXiv preprint arXiv:2504.01738(2025)

  28. [28]

    Jiacheng Liu, Alisa Liu, Ximing Lu, Sean Welleck, Peter West, Ronan Le Bras, Yejin Choi, and Hannaneh Hajishirzi. 2022. Generated Knowledge Prompting for Commonsense Reasoning. InProceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 3154–3169

  29. [29]

    Jiachang Liu, Dinghan Shen, Yizhe Zhang, William B Dolan, Lawrence Carin, and Weizhu Chen. 2022. What Makes Good In-Context Examples for GPT-3?. InProceedings of Deep Learning Inside Out (DeeLIO 2022): The 3rd Workshop on Knowledge Extraction and Integration for Deep Learning Architectures. 100–114

  30. [30]

    Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Mandar Joshi, Danqi Chen, Omer Levy, Mike Lewis, Luke Zettlemoyer, and Veselin Stoyanov. 2019. Roberta: A robustly optimized bert pretraining approach.arXiv preprint arXiv:1907.11692 (2019)

  31. [31]

    Yao Lu, Max Bartolo, Alastair Moore, Sebastian Riedel, and Pontus Stenetorp

  32. [32]

    InProceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

    Fantastically Ordered Prompts and Where to Find Them: Overcoming Few-Shot Prompt Order Sensitivity. InProceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 8086–8098

  33. [33]

    Yi-Ju Lu and Cheng-Te Li. 2020. GCAN: Graph-aware co-attention networks for explainable fake news detection on social media.arXiv preprint arXiv:2004.11648 (2020)

  34. [34]

    Jing Ma, Wei Gao, Shafiq Joty, and Kam-Fai Wong. 2019. Sentence-level evi- dence embedding for claim verification with hierarchical attention networks. Association for Computational Linguistics

  35. [35]

    Melkamu Mersha, Khang Lam, Joseph Wood, Ali K Alshami, and Jugal Kalita. 2024. Explainable artificial intelligence: A survey of needs, techniques, applications, and future direction.Neurocomputing599 (2024), 128111

  36. [36]

    Sewon Min, Mike Lewis, Hannaneh Hajishirzi, and Luke Zettlemoyer. 2022. Noisy Channel Language Model Prompting for Few-Shot Text Classification. InProceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 5316–5330

  37. [37]

    Tai Nguyen and Eric Wong. 2023. In-context Example Selection with Influences. arXiv e-prints(2023), arXiv–2302

  38. [38]

    Yixin Nie, Haonan Chen, and Mohit Bansal. 2019. Combining fact extraction and verification with neural semantic matching networks. InProceedings of the AAAI conference on artificial intelligence, Vol. 33. 6859–6866

  39. [39]

    2023.Introducing ChatGPT

    OpenAI. 2023.Introducing ChatGPT. https://openai.com/blog/chatgpt

  40. [40]

    Seongheon Park, Xuefeng Du, Min-Hsuan Yeh, Haobo Wang, and Yixuan Li. 2025. Steer LLM Latents for Hallucination Detection.arXiv preprint arXiv:2503.01917 (2025). Conference’17, July 2017, Washington, DC, USA Chuyi Kong, Gao Wei, Jing Ma ∗, Hongzhan Lin, and Yaxin Fan

  41. [41]

    Verónica Pérez-Rosas, Bennett Kleinberg, Alexandra Lefevre, and Rada Mihalcea

  42. [42]

    Automatic detection of fake news.arXiv preprint arXiv:1708.07104(2017)

  43. [43]

    Kashyap Popat, Subhabrata Mukherjee, Andrew Yates, and Gerhard Weikum

  44. [44]

    Declare: Debunking fake news and false claims using evidence-aware deep learning.arXiv preprint arXiv:1809.06416(2018)

  45. [45]

    Hannah Rashkin, Eunsol Choi, Jin Yea Jang, Svitlana Volkova, and Yejin Choi

  46. [46]

    InProceedings of the 2017 conference on empirical methods in natural language processing

    Truth of varying shades: Analyzing language in fake news and political fact-checking. InProceedings of the 2017 conference on empirical methods in natural language processing. 2931–2937

  47. [47]

    Ratcliff and David E

    John W. Ratcliff and David E. Metzener. 1988. Pattern Matching: The Gestalt Approach.Dr. Dobb’s Journal13, 7 (Jul 1988), 46

  48. [48]

    Xuan Ren, Biao Wu, and Lingqiao Liu. 2024. I learn better if you speak my lan- guage: Enhancing large language model fine-tuning with style-aligned response adjustments.CoRR(2024)

  49. [49]

    Nina Rimsky, Nick Gabrieli, Julian Schulz, Meg Tong, Evan Hubinger, and Alexan- der Turner. 2024. Steering Llama 2 via Contrastive Activation Addition. InProceed- ings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 15504–15522

  50. [50]

    Daniel Russo, Serra Sinem Tekiroğlu, and Marco Guerini. 2023. Benchmarking the generation of fact checking explanations.Transactions of the Association for Computational Linguistics11 (2023), 1250–1264

  51. [51]

    Michael Schlichtkrull, Zhijiang Guo, and Andreas Vlachos. 2023. Averitec: A dataset for real-world claim verification with evidence from the web.Advances in Neural Information Processing Systems36 (2023), 65128–65167

  52. [52]

    Tal Schuster, Roei Schuster, Darsh J Shah, and Regina Barzilay. 2020. The limita- tions of stylometry for detecting machine-generated fake news.Computational Linguistics46, 2 (2020), 499–510

  53. [53]

    Why is this misleading?

    Jiaming Shen, Jialu Liu, Dan Finnie, Negar Rahmati, Mike Bendersky, and Marc Najork. 2023. “Why is this misleading?”: Detecting News Headline Hallucinations with Explanations. InProceedings of the ACM Web Conference 2023. 1662–1672

  54. [54]

    Kai Shu, Limeng Cui, Suhang Wang, Dongwon Lee, and Huan Liu. 2019. de- fend: Explainable fake news detection. InProceedings of the 25th ACM SIGKDD international conference on knowledge discovery & data mining. 395–405

  55. [55]

    Satyam Shukla, Himanshu Dutta, and Pushpak Bhattacharyya. 2025. Recon, Answer, Verify: Agents in Search of Truth.arXiv preprint arXiv:2507.03671(2025)

  56. [57]

    Hugo Touvron, Louis Martin, Kevin Stone, Peter Albert, Amjad Almahairi, Yas- mine Babaei, Nikolay Bashlykov, Soumya Batra, Prajjwal Bhargava, Shruti Bhos- ale, et al. 2023. Llama 2: Open foundation and fine-tuned chat models.arXiv preprint arXiv:2307.09288(2023)

  57. [58]

    Bo Wang, Jing Ma, Hongzhan Lin, Zhiwei Yang, Ruichao Yang, Yuan Tian, and Yi Chang. 2024. Explainable fake news detection with large language model via defense among competing wisdom. InProceedings of the ACM Web Conference

  58. [59]

    Jason Wei, Xuezhi Wang, Dale Schuurmans, Maarten Bosma, Fei Xia, Ed Chi, Quoc V Le, Denny Zhou, et al. 2022. Chain-of-thought prompting elicits reasoning in large language models.Advances in neural information processing systems35 (2022), 24824–24837

  59. [60]

    Jiaying Wu, Jiafeng Guo, and Bryan Hooi. 2024. Fake news in sheep’s clothing: Robust fake news detection against LLM-empowered style attacks. InProceedings of the 30th ACM SIGKDD conference on knowledge discovery and data mining. 3367–3378

  60. [61]

    Lianwei Wu, Yuan Rao, Ling Sun, and Wangbo He. 2021. Evidence inference net- works for interpretable claim verification. InProceedings of the AAAI conference on artificial intelligence, Vol. 35. 14058–14066

  61. [62]

    An Yang, Anfeng Li, Baosong Yang, Beichen Zhang, Binyuan Hui, Bo Zheng, Bowen Yu, Chang Gao, Chengen Huang, Chenxu Lv, et al. 2025. Qwen3 technical report.arXiv preprint arXiv:2505.09388(2025)

  62. [63]

    Zhiwei Yang, Jing Ma, Hechang Chen, Hongzhan Lin, Ziyang Luo, and Yi Chang

  63. [64]

    A coarse-to-fine cascaded evidence-distillation neural network for explain- able fake news detection.arXiv preprint arXiv:2209.14642(2022)

  64. [65]

    Barry Menglong Yao, Aditya Shah, Lichao Sun, Jin-Hee Cho, and Lifu Huang. 2023. End-to-end multimodal fact-checking and explanation generation: A challenging dataset and models. InProceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval. 2733–2743

  65. [66]

    Zeyu Yun, Yubei Chen, Bruno A Olshausen, and Yann LeCun. 2021. Transformer visualization via dictionary learning: contextualized embedding as a linear su- perposition of transformer factors.arXiv preprint arXiv:2103.15949(2021)

  66. [67]

    Eric Zelikman, Yuhuai Wu, Jesse Mu, and Noah D Goodman. 2024. Star: Self- taught reasoner bootstrapping reasoning with reasoning. InProc. the 36th Inter- national Conference on Neural Information Processing Systems, Vol. 1126

  67. [68]

    Tianjun Zhang, Xuezhi Wang, Denny Zhou, Dale Schuurmans, and Joseph E Gon- zalez. [n. d.]. TEMPERA: Test-Time Prompt Editing via Reinforcement Learning. InThe Eleventh International Conference on Learning Representations

  68. [69]

    Xuan Zhang and Wei Gao. 2023. Towards llm-based fact verification on news claims with a hierarchical step-by-step prompting method.arXiv preprint arXiv:2310.00305(2023)

  69. [70]

    Eric Zhao, Pranjal Awasthi, and Nika Haghtalab. 2025. From Style to Facts: Mapping the Boundaries of Knowledge Injection with Finetuning.arXiv preprint arXiv:2503.05919(2025). A Prompt Template Following [5, 28], the prompt template we use to conduct training and inference for claims is as follows: A chat between a curious human and an artificial intellig...