Hybrid Feature Combinations with CNN for Bangla Fake News Classification
Pith reviewed 2026-05-20 12:48 UTC · model grok-4.3
The pith
Combining semantic, statistical, and character-level features with a CNN improves recall and F1 scores for Bangla fake news detection over single-feature baselines.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
On the BanFakeNews-2.0 dataset a CNN classifier reaches its highest recall and F1 scores when semantic, statistical, and character-level features are supplied together rather than when any single feature group is used alone.
What carries the argument
Hybrid feature combinations (semantic plus statistical plus character-level) fed into a convolutional neural network for binary classification of Bangla news articles.
If this is right
- The best-performing model uses all three feature families together rather than any subset.
- Recall improves more than precision when the hybrid set is used, so the detector misses fewer fake articles.
- The same feature-selection step can be repeated on new Bangla news collections without changing the CNN architecture.
Where Pith is reading between the lines
- The reported gains suggest that low-resource language detection pipelines can often be strengthened by adding cheap statistical and character counts instead of switching to larger models.
- If the hybrid advantage holds on other South Asian languages, the same feature recipe could serve as a quick baseline before language-specific tuning begins.
Load-bearing premise
The labels in the BanFakeNews-2.0 dataset correctly mark real and fake articles, and the feature extraction process does not create artificial performance gains.
What would settle it
Run the same CNN pipeline on a version of the dataset whose labels have been randomly permuted; if the hybrid-feature advantage disappears or reverses, the original gains are likely tied to label quality rather than the feature combinations.
Figures
read the original abstract
Nowadays, people in Bangladesh frequently rely on the internet and social media for daily news instead of traditional newspapers. However, the spread of false Bangla news through these platforms poses risks and challenges to the credibility of authentic media. Although several studies have been conducted on detecting Bangla fake news, there is still significant room for improvement in this area. To assist people, this research explores the effectiveness of feature selection approaches in identifying appropriate features, such as semantic, statistical, and character-level features, or their combinations, on the BanFakeNews-2.0 dataset for detecting Bangla fake news using a CNN model. In this paper, key findings reveal that combining multiple features significantly improves recall and F1-scores compared to using individual features alone. The code for this research can be availed here, https://github.com/gulzar09/Bn\_FNews\_H.Feature.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper explores feature selection and hybrid combinations of semantic, statistical, and character-level features fed into a CNN classifier for Bangla fake news detection on the BanFakeNews-2.0 dataset. It reports that multi-feature combinations yield notable gains in recall and F1-score relative to single-feature baselines, with code released on GitHub.
Significance. If the performance gains prove robust under rigorous validation, the work would usefully demonstrate the value of feature complementarity for low-resource-language fake-news tasks and could guide practitioners toward hybrid representations in CNN pipelines. The public code release is a positive step toward reproducibility.
major comments (3)
- [Methods] Methods: The manuscript supplies no information on train-test split ratios, the hyperparameter search procedure (learning rate, filter sizes, dropout), or whether nested cross-validation was employed when selecting and evaluating feature combinations. Without these details the headline claim that hybrids improve recall/F1 cannot be verified as free of selection bias or multiple-comparison artifacts.
- [Results] Results: No statistical significance tests, confidence intervals, or error bars accompany the reported metrics. Consequently it is impossible to determine whether the observed gains over individual features are reliable or could arise from random variation.
- [Experimental Setup] Experimental design: The description of post-hoc feature selection and combination exploration does not clarify whether performance on the same data used for final reporting was used to choose which hybrids to highlight. If so, the central claim of genuine complementarity is at risk of inflation.
minor comments (2)
- [Abstract] The abstract states that 'key findings reveal that combining multiple features significantly improves recall and F1-scores' but does not quantify the absolute or relative gains; adding concrete numbers would strengthen the summary.
- [Feature Extraction] Notation for the three feature families (semantic, statistical, character-level) is introduced without explicit definitions or formulas; a short table or equations would improve clarity.
Simulated Author's Rebuttal
We thank the referee for the constructive comments and suggestions. We address each of the major comments below and will incorporate the necessary revisions to improve the manuscript's clarity and rigor.
read point-by-point responses
-
Referee: The manuscript supplies no information on train-test split ratios, the hyperparameter search procedure (learning rate, filter sizes, dropout), or whether nested cross-validation was employed when selecting and evaluating feature combinations. Without these details the headline claim that hybrids improve recall/F1 cannot be verified as free of selection bias or multiple-comparison artifacts.
Authors: We agree with the referee that these methodological details are crucial. Our experiments utilized an 80:20 train-test split. Hyperparameters were selected using grid search on the training portion, with specific ranges for learning rate, filter sizes, and dropout rates. Nested cross-validation was not used. In the revised version, we will provide a comprehensive description of the experimental protocol, including these details and a discussion of potential limitations regarding selection bias. revision: yes
-
Referee: No statistical significance tests, confidence intervals, or error bars accompany the reported metrics. Consequently it is impossible to determine whether the observed gains over individual features are reliable or could arise from random variation.
Authors: We acknowledge this limitation in the current manuscript. To address it, we will conduct statistical significance testing (e.g., using paired t-tests) and include confidence intervals and error bars in the results section of the revised manuscript. This will help demonstrate the reliability of the performance gains. revision: yes
-
Referee: The description of post-hoc feature selection and combination exploration does not clarify whether performance on the same data used for final reporting was used to choose which hybrids to highlight. If so, the central claim of genuine complementarity is at risk of inflation.
Authors: We appreciate the concern regarding potential data leakage in feature selection. In our work, feature combinations were explored and selected using cross-validation on the training data, with final evaluation performed on an independent test set. We will revise the experimental setup section to explicitly detail this process and the measures taken to avoid overfitting to the test data. revision: yes
Circularity Check
No circularity: empirical ML comparison with no derivation chain
full rationale
The paper is a standard empirical study that evaluates combinations of semantic, statistical, and character-level features fed to a CNN on the BanFakeNews-2.0 dataset. No mathematical derivation, first-principles prediction, or claimed uniqueness theorem is present. The central finding (hybrid features improve recall/F1) is an experimental outcome, not a quantity that reduces to its inputs by construction. No self-definitional steps, fitted parameters renamed as predictions, or load-bearing self-citations appear. The work is self-contained against external benchmarks and receives the default non-circularity finding.
Axiom & Free-Parameter Ledger
free parameters (1)
- CNN hyperparameters (learning rate, filter sizes, dropout)
axioms (1)
- domain assumption BanFakeNews-2.0 labels are ground truth with negligible noise
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
We extracted features … using the following six feature extraction methods: TF-IDF, Word2Vec, FastText, N-Gram, Character-Level TF-IDF, Statistical Text Features … Comprehensive Feature Combination Testing … CNN Architecture
-
IndisputableMonolith/Foundation/BranchSelection.leanbranch_selection unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
Table IV … tfidf, word2vec, fasttext, char, stats … accuracy 0.91, F1 0.83
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
A study towards bangla fake news detection using machine learning and deep learning,
E. Hossain, M. Nadim Kaysar, A. Z. M. Jalal Uddin Joy, M. Miza- nur Rahman, and W. Rahman, “A study towards bangla fake news detection using machine learning and deep learning,” in Sentimental analysis and deep learning: proceedings of ICSADL 2021 . Springer, 2021, pp. 79–95
work page 2021
-
[2]
Machine learning for fake news classification with optimal feature selection,
M. Fayaz, A. Khan, M. Bilal, and S. U. Khan, “Machine learning for fake news classification with optimal feature selection,” Soft Computing, vol. 26, no. 16, pp. 7763–7771, 2022
work page 2022
-
[3]
Bangla fake news detection using machine learning, deep learning and transformer models,
R. I. Rasel, A. H. Zihad, N. Sultana, and M. M. Hoque, “Bangla fake news detection using machine learning, deep learning and transformer models,” in 2022 25th International Conference on Computer and Information Technology (ICCIT) . IEEE, 2022, pp. 959–964
work page 2022
-
[4]
Bangla counterfeit news identification: Using the power of bert,
M. S. Khatun and I. Khan, “Bangla counterfeit news identification: Using the power of bert,” in 2024 IEEE International Conference on Power , Electrical, Electronics and Industrial Applications (PEEIACON). IEEE, 2024, pp. 518–522
work page 2024
-
[5]
Detection of bangla fake news using mnb and svm classifier,
M. G. Hussain, M. R. Hasan, M. Rahman, J. Protim, and S. Al Hasan, “Detection of bangla fake news using mnb and svm classifier,” in 2020 International conference on computing, electronics & communications engineering (iCCECE) . IEEE, 2020, pp. 81–85
work page 2020
-
[6]
Bnnetxtreme: An enhanced methodology for bangla fake news detection online,
Z. Wahid, A. A. Imran, and M. R. I. Rifat, “Bnnetxtreme: An enhanced methodology for bangla fake news detection online,” in International Conference on Computational Data and Social Networks . Springer, 2022, pp. 157–166
work page 2022
-
[7]
R. Barua, M. Rahman, and U. G. Joy, “Comparative analysis of bangla news classification: a study of fake news detection and multiclass clas- sification using bert and fasttext,” International Journal of Computers and Applications , vol. 47, no. 5, pp. 475–485, 2025
work page 2025
-
[8]
S. Rohman, J. Ferdous, S. M. R. Ullah, and M. A. Rahman, “Ibfnd: An improved dataset for bangla fake news detection and comparative analysis of performance of baseline models,” in 2023 International Conference on Next-Generation Computing, IoT and Machine Learning (NCIM). IEEE, 2023, pp. 1–6
work page 2023
-
[9]
M. Ahammad, A. Sani, K. Rahman, M. T. Islam, M. M. R. Masud, M. M. Hassan, M. A. T. Rony, S. M. N. Alam, and M. S. H. Mukta, “Roberta-gcn: A novel approach for combating fake news in bangla using advanced language processing and graph convolutional networks,” IEEE Access , 2024
work page 2024
-
[10]
Breaking the fake news barrier: Deep learning approaches in bangla language,
P . K. Mondal, S. S. Khan, M. M. Rana, S. S. Ramit, A. Sattar, and M. S. Rahman, “Breaking the fake news barrier: Deep learning approaches in bangla language,” in 2024 15th International Conference on Computing Communication and Networking Technologies (ICCCNT) . IEEE, 2024, pp. 1–6
work page 2024
-
[11]
U. Roy, M. S. Tahosin, M. M. Hasan, T. Islam, F. Imtiaz, M. R. Sadik, Y . Maleh, R. B. Sulaiman, and M. S. Hassan Talukder, “Enhancing bangla fake news detection using bidirectional gated recurrent units and deep learning techniques,” in Proceedings of the 7th International Conference on Networking, Intelligent Systems and Security , 2024, pp. 1–10
work page 2024
-
[12]
Semi-supervised based bangla fake review detection: A comparative analysis,
N. Absar, T. Mahmud, A. Hanip, and M. S. Hossain, “Semi-supervised based bangla fake review detection: A comparative analysis,” in 2025 In- ternational Conference on Inventive Computation Technologies (ICICT) . IEEE, 2025, pp. 1428–1433
work page 2025
-
[13]
Automatic detection of manipulated bangla news: A new knowledge-driven approach,
A. Akther, K. M. Alam, and R. Debnath, “Automatic detection of manipulated bangla news: A new knowledge-driven approach,” Natural Language Processing Journal , p. 100155, 2025
work page 2025
-
[14]
F. T. J. Faria, M. B. Moin, Z. Hasan, M. A. A. Khandaker, N. Islam, K. M. Hasib, and M. Mridha, “Multibanfakedetect: Integrating advanced fusion techniques for multimodal detection of bangla fake news in under- resourced contexts,” International Journal of Information Management Data Insights , vol. 5, no. 2, p. 100347, 2025
work page 2025
-
[15]
From scarcity to capability: Empowering fake news detection in low-resource languages with LLMs,
H. M. Shibu, S. Datta, M. S. Miah, N. Sami, M. S. Chowdhury, and M. S. Islam, “From scarcity to capability: Empowering fake news detection in low-resource languages with LLMs,” in Proceedings of the First Workshop on Natural Language Processing for Indo-Aryan and Dravidian Languages , R. Weerasinghe, I. Anuradha, and D. Sumanathilaka, Eds. Abu Dhabi: Asso...
work page 2025
-
[16]
Continuous-bag-of-words and skip-gram for word vector train- ing and text classification,
H. Xia, “Continuous-bag-of-words and skip-gram for word vector train- ing and text classification,” in Journal of Physics: Conference Series , vol. 2634, no. 1. IOP Publishing, 2023, p. 012052
work page 2023
-
[17]
Review and visualization of facebook’s fasttext pretrained word vector model,
J. C. Y oung and A. Rusli, “Review and visualization of facebook’s fasttext pretrained word vector model,” in 2019 international conference on engineering, science, and industrial applications (ICESI) . IEEE, 2019, pp. 1–6
work page 2019
-
[18]
Feature selection for fake news classification,
S. Sverdrup-Thygeson and P . C. Haddow, “Feature selection for fake news classification,” in 2021 IEEE Symposium Series on Computational Intelligence (SSCI) . IEEE, 2021, pp. 1–8. 6
work page 2021
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.