arxiv: 2604.18272 · v1 · submitted 2026-04-20 · 💻 cs.CE

Recognition: unknown

MFMDQwen: Multilingual Financial Misinformation Detection Based on Large Language Model

Jimin Huang, Sophia Ananiadou, Tianlei Zhu, Xiaorui Guo, Xiao-Yang Liu, Yuechen Jiang, Yupeng Cao, Yuyan Wang, Zhiwei Liu, Zhiyang Deng, Zhiyuan Yao

Authors on Pith no claims yet

Pith reviewed 2026-05-10 03:18 UTC · model grok-4.3

classification 💻 cs.CE

keywords financial misinformationmultilingual detectionlarge language modelsinstruction tuningbenchmark datasetopen-source modelfinancial marketsmisinformation detection

0 comments

The pith

MFMDQwen is the first open-source large language model built to detect financial misinformation in English, Chinese, Greek, and Bengali.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Financial misinformation threatens market stability and investor decisions, yet existing tools focus mainly on English and single tasks. The paper introduces MFMDQwen by adapting a base large language model with instruction tuning on a new dataset called MFMD4Instruction that covers four languages. It also releases MFMDBench to test model performance on detection tasks. Experiments show the resulting model outperforms other open-source large language models on the benchmark. This matters because it supplies an accessible starting point for monitoring false financial claims in global, multilingual settings.

Core claim

The paper presents MFMDQwen as the first open-source LLM designed for multilingual financial misinformation detection tasks. It supports this with MFMD4Instruction, the first instruction dataset for such tasks that covers English, Chinese, Greek, and Bengali, along with MFMDBench as a dedicated evaluation benchmark. Experimental results on MFMDBench show that MFMDQwen outperforms existing open-source LLMs.

What carries the argument

MFMDQwen, a large language model fine-tuned via instruction tuning on the MFMD4Instruction dataset to handle detection, classification, and related tasks across four languages.

If this is right

Detection of financial misinformation becomes feasible in languages other than English using an open model.
MFMDBench supplies a public standard for measuring progress on multilingual financial tasks.
Specialized instruction tuning demonstrates a route to stronger performance on domain-specific detection problems.
Regulators and platforms gain a concrete open tool for addressing false financial claims in international contexts.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same instruction-tuning pattern could be applied to misinformation detection in additional languages or adjacent domains such as health claims.
Pairing the model with live market data feeds might allow earlier flagging of coordinated false narratives.
The work leaves open how well the model handles entirely new misinformation tactics that emerge after the benchmark was built.
Future tests could check whether adding numerical financial data alongside text improves verification accuracy.

Load-bearing premise

The MFMD4Instruction and MFMDBench datasets capture the real complexity and multilingual character of financial misinformation well enough for performance gains to transfer.

What would settle it

Running MFMDQwen on a new, independently gathered set of financial misinformation examples in the four languages that were never seen during dataset construction or benchmark creation would show whether the reported outperformance persists.

Figures

Figures reproduced from arXiv: 2604.18272 by Jimin Huang, Sophia Ananiadou, Tianlei Zhu, Xiaorui Guo, Xiao-Yang Liu, Yuechen Jiang, Yupeng Cao, Yuyan Wang, Zhiwei Liu, Zhiyang Deng, Zhiyuan Yao.

**Figure 1.** Figure 1: The architecture of MFMDQwen. 3.1 Task formalization We formulate financial misinformation detection as a generative task, leveraging a generative model as the foundation. Specifically, we adopt an autoregressive language model Pϕ(y | x) parameterized by pre-trained weights ϕ. This model is capable of simultaneously handling multiple multilingual financial misinformation detection tasks. Each task t is de… view at source ↗

**Figure 2.** Figure 2: Confusion matrices for five binary classifica [PITH_FULL_IMAGE:figures/full_fig_p007_2.png] view at source ↗

**Figure 3.** Figure 3: Confusion matrices for four datasets with het [PITH_FULL_IMAGE:figures/full_fig_p007_3.png] view at source ↗

read the original abstract

Financial misinformation poses significant threats to financial market stability and individuals' investment decisions. The multilingual environment and the inherent complexity of financial information present substantial challenges for Multilingual Financial Misinformation Detection (MFMD). Existing LLM-based approaches for financial misinformation detection primarily focus on English and a single financial misinformation detection task, which limits their ability to capture multilingual contexts and complex features. In this paper, we propose MFMDQwen, the first open-source LLM designed for MFMD tasks. Furthermore, we introduce MFMD4Instruction, the first instruction dataset supporting MFMD with LLMs, covering English, Chinese, Greek, and Bengali. We also construct MFMDBench, a benchmark dataset for evaluating the MFMD capabilities of LLMs. Experimental results on MFMDBench demonstrate that our model outperforms existing open-source LLMs. The project is available at https://github.com/lzw108/FMD.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

MFMDQwen adds the first open datasets and fine-tuned model for multilingual financial misinformation detection across four languages, but the outperformance claim rests on undocumented dataset construction.

read the letter

The main takeaway is that the authors release MFMDQwen, a Qwen-based model tuned for financial misinformation detection, along with MFMD4Instruction for instruction tuning and MFMDBench for evaluation. These cover English, Chinese, Greek, and Bengali, and they point to a GitHub repo for the code and data. That fills a clear gap since most prior work stays in English and single-task settings. Releasing the resources openly is the useful part here; others working on applied finance NLP can actually use the datasets if they turn out to be solid. The paper does a straightforward job stating the problem and positioning the new assets against existing English-only approaches. The soft spots sit in the evaluation. The abstract reports that MFMDQwen beats other open-source LLMs on MFMDBench, yet supplies no sample sizes, class balance, annotation protocol, inter-annotator agreement, or train-test split details. Without those, it is impossible to rule out leakage between the instruction set and the benchmark or systematic biases that happen to favor a Qwen derivative. The stress-test note on unverified dataset construction holds up on the available text. This is aimed at researchers who need multilingual benchmarks in financial misinformation or who want to extend LLM fine-tuning to low-resource languages in finance. A reader could get practical value from the released data and model once the missing methodology is added. I would send it for peer review because new resources in an applied domain are worth referee time, provided the authors expand the dataset and evaluation sections with concrete numbers and ablations.

Referee Report

3 major / 1 minor

Summary. The paper proposes MFMDQwen, the first open-source LLM for multilingual financial misinformation detection (MFMD). It introduces MFMD4Instruction, the first instruction-tuning dataset for MFMD covering English, Chinese, Greek, and Bengali, along with MFMDBench as a new evaluation benchmark. The central claim is that MFMDQwen outperforms existing open-source LLMs on MFMDBench.

Significance. If the datasets prove representative and the performance gains hold under rigorous scrutiny, this work would provide a useful open-source foundation for multilingual financial misinformation detection, addressing a gap where most LLM efforts remain English-only. The release of new instruction data and a benchmark is a concrete contribution that could enable follow-on research.

major comments (3)

The descriptions of MFMD4Instruction and MFMDBench supply no sample sizes, class balance statistics, annotation protocol, inter-annotator agreement figures, or train/test split details. Because the outperformance claim rests entirely on results from these newly constructed resources, the absence of this information prevents verification that the benchmark is unbiased and free of leakage from the instruction-tuning stage.
The experimental section provides no concrete evaluation metrics (accuracy, F1, etc.), baseline model names and versions, training hyperparameters, or ablation results. Without these, the abstract's assertion that MFMDQwen outperforms other open-source LLMs cannot be assessed for statistical significance or robustness.
The model description does not specify the exact fine-tuning procedure, instruction format, or rationale for selecting Qwen as the base model over other multilingual LLMs. This choice is load-bearing for the claim that the resulting system is particularly suited to MFMD tasks.

minor comments (1)

Ensure the GitHub repository contains the full dataset construction scripts, annotation guidelines, and exact evaluation code to support reproducibility claims.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for their detailed and constructive feedback. We agree that the manuscript requires substantial additional details on datasets, experiments, and model choices to support the claims and enable verification. We will revise the paper accordingly.

read point-by-point responses

Referee: The descriptions of MFMD4Instruction and MFMDBench supply no sample sizes, class balance statistics, annotation protocol, inter-annotator agreement figures, or train/test split details. Because the outperformance claim rests entirely on results from these newly constructed resources, the absence of this information prevents verification that the benchmark is unbiased and free of leakage from the instruction-tuning stage.

Authors: We agree that these details are essential for assessing bias, leakage, and reproducibility. The current manuscript does not include them. In the revised version, we will add: exact sample sizes per language and class, class balance statistics, the full annotation protocol (including guidelines, annotator training, and quality control), inter-annotator agreement scores, and explicit train/test split descriptions with overlap checks between MFMD4Instruction and MFMDBench. revision: yes
Referee: The experimental section provides no concrete evaluation metrics (accuracy, F1, etc.), baseline model names and versions, training hyperparameters, or ablation results. Without these, the abstract's assertion that MFMDQwen outperforms other open-source LLMs cannot be assessed for statistical significance or robustness.

Authors: We acknowledge the experimental section is under-specified. We will expand it to report concrete metrics (accuracy, F1, precision, recall) per language and overall, list exact baseline models with versions and sources, provide all training hyperparameters, and include ablation results. We will also add statistical significance testing to substantiate the outperformance claims. revision: yes
Referee: The model description does not specify the exact fine-tuning procedure, instruction format, or rationale for selecting Qwen as the base model over other multilingual LLMs. This choice is load-bearing for the claim that the resulting system is particularly suited to MFMD tasks.

Authors: We agree more detail is needed on the model. The revision will specify the fine-tuning procedure (including method, epochs, and any PEFT settings), the exact instruction format/template, and the rationale for Qwen (its multilingual coverage for English, Chinese, Greek, and Bengali, plus efficiency and domain suitability), with brief comparisons to alternatives such as BLOOM or mT5. revision: yes

Circularity Check

0 steps flagged

No circularity: claims rest on new datasets and standard LLM fine-tuning/evaluation without reduction to inputs by construction

full rationale

The paper introduces MFMDQwen as a fine-tuned LLM, constructs MFMD4Instruction for instruction tuning, and MFMDBench for evaluation, then reports outperformance on the benchmark. No equations, parameter-fitting steps, or self-citations are present that would make the outperformance claim equivalent to the input data by definition. The derivation chain is self-contained: new task-specific data and model are created, then tested on held-out benchmark data. This is the common case of an applied ML paper with no load-bearing circular elements.

Axiom & Free-Parameter Ledger

2 free parameters · 1 axioms · 0 invented entities

The approach relies on standard LLM fine-tuning assumptions and the creation of new datasets without independent validation mentioned.

free parameters (2)

Base LLM choice (Qwen)
The model is based on Qwen, with parameters from prior work.
Instruction tuning parameters
Hyperparameters for fine-tuning not detailed in abstract.

axioms (1)

domain assumption Fine-tuned LLMs can capture complex features of financial misinformation in multiple languages
Central to the proposal of MFMDQwen.

pith-pipeline@v0.9.0 · 5488 in / 1092 out tokens · 36862 ms · 2026-05-10T03:18:00.383671+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

51 extracted references · 27 canonical work pages · 15 internal anchors

[1]

Proceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining , pages=

Emollms: A series of emotional large language models and annotation tools for comprehensive affective analysis , author=. Proceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining , pages=
[2]

Text summarization branches out , pages=

Rouge: A package for automatic evaluation of summaries , author=. Text summarization branches out , pages=
[3]

BERTScore: Evaluating Text Generation with BERT

Bertscore: Evaluating text generation with bert , author=. arXiv preprint arXiv:1904.09675 , year=

work page internal anchor Pith review arXiv 1904
[4]

The RefinedWeb Dataset for Falcon LLM: Outperforming Curated Corpora with Web Data, and Web Data Only

The RefinedWeb dataset for Falcon LLM: outperforming curated corpora with web data, and web data only , author=. arXiv preprint arXiv:2306.01116 , year=

work page internal anchor Pith review arXiv
[5]

Llama 2: Open Foundation and Fine-Tuned Chat Models

Llama 2: Open foundation and fine-tuned chat models , author=. arXiv preprint arXiv:2307.09288 , year=

work page internal anchor Pith review Pith/arXiv arXiv
[6]

Gemma: Open Models Based on Gemini Research and Technology

Gemma: Open models based on gemini research and technology , author=. arXiv preprint arXiv:2403.08295 , year=

work page internal anchor Pith review arXiv
[7]

Mistral 7B

Mistral 7B , author=. arXiv preprint arXiv:2310.06825 , year=

work page internal anchor Pith review Pith/arXiv arXiv
[8]

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

Bert: Pre-training of deep bidirectional transformers for language understanding , author=. arXiv preprint arXiv:1810.04805 , year=

work page internal anchor Pith review Pith/arXiv arXiv
[9]

RoBERTa: A Robustly Optimized BERT Pretraining Approach

Roberta: A robustly optimized bert pretraining approach , author=. arXiv preprint arXiv:1907.11692 , year=

work page internal anchor Pith review Pith/arXiv arXiv 1907
[10]

Decoupled Weight Decay Regularization

Decoupled weight decay regularization , author=. arXiv preprint arXiv:1711.05101 , year=

work page internal anchor Pith review Pith/arXiv arXiv
[11]

Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining , pages=

Deepspeed: System optimizations enable training deep learning models with over 100 billion parameters , author=. Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining , pages=
[12]

2023 , eprint=

Fin-Fact: A Benchmark Dataset for Multimodal Financial Fact Checking and Explanation Generation , author=. 2023 , eprint=

2023
[13]

International Conference on Pattern Recognition and Machine Intelligence , pages=

Financial Misinformation Detection via RoBERTa and Multi-channel Networks , author=. International Conference on Pattern Recognition and Machine Intelligence , pages=. 2023 , organization=

2023
[14]

Information Systems Frontiers , volume=

A theory-based deep-learning approach to detecting disinformation in financial social media , author=. Information Systems Frontiers , volume=. 2023 , publisher=

2023
[15]

2023 15th International Conference on COMmunication Systems & NETworkS (COMSNETS) , pages=

Financial fake news detection via context-aware embedding and sequential representation using cross-joint networks , author=. 2023 15th International Conference on COMmunication Systems & NETworkS (COMSNETS) , pages=. 2023 , organization=

2023
[16]

arXiv preprint arXiv:2403.06765 , year=

ConspEmoLLM: Conspiracy Theory Detection Using an Emotion-Based Large Language Model , author=. arXiv preprint arXiv:2403.06765 , year=

work page arXiv
[18]

arXiv preprint arXiv:2309.13567 , year=

Mentalllama: Interpretable mental health analysis on social media with large language models , author=. arXiv preprint arXiv:2309.13567 , year=

work page arXiv
[19]

arXiv preprint arXiv:2308.11584 , year=

Building emotional support chatbots in the era of llms , author=. arXiv preprint arXiv:2308.11584 , year=

work page arXiv
[20]

arXiv preprint arXiv:2211.00083 , year=

When flue meets flang: Benchmarks and large pre-trained language model for financial domain , author=. arXiv preprint arXiv:2211.00083 , year=

work page arXiv
[21]

2020 , publisher=

Fake news in financial markets , author=. 2020 , publisher=

2020
[22]

The Llama 3 Herd of Models

The llama 3 herd of models , author=. arXiv preprint arXiv:2407.21783 , year=

work page internal anchor Pith review Pith/arXiv arXiv
[23]

2025 , month = aug, day =

OpenAI , title =. 2025 , month = aug, day =

2025
[24]

2025 , eprint=

Qwen2.5 Technical Report , author=. 2025 , eprint=

2025
[25]

Qwen3 Technical Report

Qwen3 technical report , author=. arXiv preprint arXiv:2505.09388 , year=

work page internal anchor Pith review Pith/arXiv arXiv
[26]

2025 , url =

Introducing GPT-4.1 , author =. 2025 , url =

2025
[27]

2025 , url =

ntroducing Claude Sonnet 4.5 , author =. 2025 , url =

2025
[28]

Gemini 2.5: Pushing the Frontier with Advanced Reasoning, Multimodality, Long Context, and Next Generation Agentic Capabilities

Gemini 2.5: Pushing the frontier with advanced reasoning, multimodality, long context, and next generation agentic capabilities , author=. arXiv preprint arXiv:2507.06261 , year=

work page internal anchor Pith review Pith/arXiv arXiv
[29]

DeepSeek-V3.2: Pushing the Frontier of Open Large Language Models

Deepseek-v3. 2: Pushing the frontier of open large language models , author=. arXiv preprint arXiv:2512.02556 , year=

work page internal anchor Pith review arXiv
[30]

Companion Proceedings of the ACM on Web Conference 2025 , pages=

Fmdllama: Financial misinformation detection based on large language models , author=. Companion Proceedings of the ACM on Web Conference 2025 , pages=

2025
[31]

Same Claim, Different Judgment: Benchmarking Scenario-Induced Bias in Multilingual Financial Misinformation Detection

Same Claim, Different Judgment: Benchmarking Scenario-Induced Bias in Multilingual Financial Misinformation Detection , author=. arXiv preprint arXiv:2601.05403 , year=

work page internal anchor Pith review Pith/arXiv arXiv
[32]

arXiv preprint arXiv:2309.12363 , year=

Investigating online financial misinformation and its consequences: A computational perspective , author=. arXiv preprint arXiv:2309.12363 , year=

work page arXiv
[33]

International IOT, Electronics and Mechatronics Conference , pages=

Financial Misinformation and Trading Manipulation Using Large Language Models (LLMs) , author=. International IOT, Electronics and Mechatronics Conference , pages=. 2025 , organization=

2025
[34]

All that glisters is not gold: A bench- mark for reference-free counterfactual financial misinformation detection.arXiv preprint arXiv:2601.04160, 2026

All That Glisters Is Not Gold: A Benchmark for Reference-Free Counterfactual Financial Misinformation Detection , author=. arXiv preprint arXiv:2601.04160 , year=

work page arXiv
[35]

Proceedings of the fourth ACM international conference on AI in finance , pages=

Large language models in finance: A survey , author=. Proceedings of the fourth ACM international conference on AI in finance , pages=
[36]

Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing , pages=

FinDVer: Explainable claim verification over long and hybrid-content financial documents , author=. Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing , pages=

2024
[37]

FinNLP-FNP-LLMFinLegal-2025 shared task: financial misinformation detection challenge task , author=. Proceedings of the Joint Workshop of the 9th Financial Technology and Natural Language Processing (FinNLP), the 6th Financial Narrative Processing (FNP), and the 1st Workshop on Large Language Models for Finance and Legal (LLMFinLegal) , pages=

2025
[38]

Dunamu ML at the financial misinformation detection challenge task: improving supervised fine-tuning with LLM-based data augmentation , author=. Proceedings of the Joint Workshop of the 9th Financial Technology and Natural Language Processing (FinNLP), the 6th Financial Narrative Processing (FNP), and the 1st Workshop on Large Language Models for Finance ...
[39]

FMD-mllama at the financial misinformation detection challenge task: multimodal reasoning and evidence generation , author=. Proceedings of the Joint Workshop of the 9th Financial Technology and Natural Language Processing (FinNLP), the 6th Financial Narrative Processing (FNP), and the 1st Workshop on Large Language Models for Finance and Legal (LLMFinLeg...
[40]

Capybara at the financial misinformation detection challenge task: chain-of-thought enhanced financial misinformation detection , author=. Proceedings of the Joint Workshop of the 9th Financial Technology and Natural Language Processing (FinNLP), the 6th Financial Narrative Processing (FNP), and the 1st Workshop on Large Language Models for Finance and Le...
[41]

DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning

Deepseek-r1: Incentivizing reasoning capability in llms via reinforcement learning , author=. arXiv preprint arXiv:2501.12948 , year=

work page internal anchor Pith review Pith/arXiv arXiv
[42]

arXiv preprint arXiv:2502.08127 , year=

Fino1: On the transferability of reasoning-enhanced llms and reinforcement learning to finance , author=. arXiv preprint arXiv:2502.08127 , year=

work page arXiv
[43]

arXiv preprint arXiv:2601.04853 , year=

RAAR: Retrieval Augmented Agentic Reasoning for Cross-Domain Misinformation Detection , author=. arXiv preprint arXiv:2601.04853 , year=

work page arXiv
[44]

Fact-R1: Towards Explainable Video Misinformation Detection with Deep Reasoning , author=
[45]

Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies , pages=

CHEF: A Pilot Chinese Dataset for Evidence-Based Fact-Checking , author=. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies , pages=

2022
[46]

Proceedings of the 30th ACM international conference on information & knowledge management , pages=

MDFEND: Multi-domain fake news detection , author=. Proceedings of the 30th ACM international conference on information & knowledge management , pages=
[47]

BanMANI: A dataset to identify manipulated social media news in Bangla , author=. Proceedings of the Workshop on Computational Terminology in NLP and Translation Studies (ConTeNTS) Incorporating the 16th Workshop on Building and Using Comparable Corpora (BUCC) , pages=
[48]

Pixiu: A large language model, instruction data and evaluation benchmark for finance.arXiv preprint arXiv:2306.05443,

Pixiu: A large language model, instruction data and evaluation benchmark for finance , author=. arXiv preprint arXiv:2306.05443 , year=

work page arXiv
[49]

BloombergGPT: A Large Language Model for Finance

Bloomberggpt: A large language model for finance , author=. arXiv preprint arXiv:2303.17564 , year=

work page internal anchor Pith review arXiv
[50]

arXiv preprint arXiv:2505.14917 , year=

ConspEmoLLM-v2: A robust and stable model to detect sentiment-transformed conspiracy theories , author=. arXiv preprint arXiv:2505.14917 , year=

work page arXiv
[51]

Huatuogpt-o1, towards medical com- plex reasoning with llms.arXiv:2412.18925, 2024

Huatuogpt-o1, towards medical complex reasoning with llms , author=. arXiv preprint arXiv:2412.18925 , year=

work page arXiv
[52]

arXiv preprint arXiv:2512.09636 , year=

MentraSuite: Post-Training Large Language Models for Mental Health Reasoning and Assessment , author=. arXiv preprint arXiv:2512.09636 , year=

work page arXiv