Filter-then-Verify: A Multiphase GNN and ModernBERT Framework for Social Engineering Detection in Email Networks
Pith reviewed 2026-05-20 13:39 UTC · model grok-4.3
The pith
A two-stage filter-then-verify framework uses graph neural networks to spot anomalous email patterns and ModernBERT to check message content, achieving 86% recall and over 92% precision on social engineering detection.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The central claim is that combining inductive Graph Neural Networks for structural anomaly detection with a co-attention ModernBERT model for content verification creates a practical, scalable two-stage filter-then-verify system that identifies multi-stage social engineering attacks in email networks, as shown by 86% recall in the GNN filtering stage and over 92% precision after BERT refinement on the Enron dataset augmented with synthetic campaigns.
What carries the argument
The filter-then-verify pipeline, where GNN-based structural filtering first identifies anomalous sender-receiver patterns and ModernBERT then verifies message context to reduce false positives.
If this is right
- The approach detects both external attacks and insider threats within the same email network.
- Structural filtering provides high recall while content verification keeps precision high enough for operational use.
- The framework scales to large email networks because the GNN stage is inductive and the BERT stage is applied only to filtered candidates.
- Multi-stage campaigns that unfold over sequences of messages can be caught by the combined structural and content signals.
Where Pith is reading between the lines
- If the reported numbers hold on live traffic, organizations could insert the pipeline as a lightweight pre-filter before traditional spam or phishing gateways.
- The same structural-plus-content split might transfer to other trust-exploitation settings such as messaging platforms or enterprise collaboration tools.
- Extending the GNN to incorporate timing or attachment metadata could further improve detection of campaigns that rely on gradual relationship building.
Load-bearing premise
The synthetic social engineering campaigns added to the Enron dataset are realistic enough that performance measured on this mixture will generalize to real-world email traffic and attacks.
What would settle it
Running the trained framework on a separate collection of verified real social engineering emails collected from actual incidents and checking whether recall stays near 86% and precision stays above 92%.
Figures
read the original abstract
Social engineering attacks exploit human trust rather than software vulnerabilities, making them difficult to detect using conventional filters. We propose a two-stage filter-then-verify framework combining inductive Graph Neural Networks (GNNs) for structural anomaly detection with a co-attention ModernBERT model for content verification. The GNN identifies anomalous sender-receiver patterns, while BERT analyzes message context to reduce false positives. Using the Enron dataset augmented with realistic synthetic campaigns, we show that the framework achieves 86% recall in structural filtering and over 92% precision after BERT refinement, effectively detecting both external attacks and insider threats. Our results demonstrate that combining structural and content analysis allows practical, scalable detection of multi-stage social engineering attacks in email networks.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper proposes a two-stage filter-then-verify framework for detecting social engineering attacks in email networks. Inductive Graph Neural Networks (GNNs) perform structural anomaly detection on sender-receiver patterns, followed by a co-attention ModernBERT model for content verification to reduce false positives. The system is evaluated on the Enron dataset augmented with realistic synthetic social engineering campaigns and reports 86% recall after structural filtering and over 92% precision after BERT refinement, claiming effective detection of both external attacks and insider threats via combined structural and content analysis.
Significance. If the results hold under proper validation, the multiphase GNN-then-ModernBERT pipeline could offer a practical, scalable method for identifying multi-stage social engineering attacks that exploit both network anomalies and linguistic cues. The approach addresses limitations of single-modality detectors and demonstrates how structural filtering can be paired with modern language models for improved precision. The significance is limited by the absence of external validation for the synthetic data and missing experimental controls.
major comments (2)
- [Dataset and Evaluation] The manuscript augments the Enron dataset with 'realistic synthetic campaigns' but supplies no external anchoring such as comparison to documented real-world incidents, expert labeling of the synthetics, or distribution-shift metrics between synthetic and authentic attack traces. This assumption is load-bearing for the central claim because the headline metrics (86% structural recall and >92% final precision) are measured exclusively on the self-constructed mixture.
- [Experimental Results] The abstract and results report concrete performance figures (86% recall, >92% precision) yet supply no experimental protocol, baseline comparisons, error bars, or ablation results on the GNN filtering stage versus the BERT refinement stage. Without these controls it is impossible to determine whether the reported gains arise from the proposed pipeline or from dataset construction choices.
minor comments (2)
- [Abstract] The abstract uses the vague phrase 'over 92% precision'; reporting the exact value and confidence interval would improve clarity.
- [Model Architecture] Clarify how the GNN node embeddings are passed as input or conditioning to the co-attention ModernBERT model.
Simulated Author's Rebuttal
We thank the referee for the thoughtful and constructive comments on our manuscript. The feedback highlights important aspects of validation and experimental rigor that we will address to strengthen the paper. We respond to each major comment below and indicate planned revisions.
read point-by-point responses
-
Referee: [Dataset and Evaluation] The manuscript augments the Enron dataset with 'realistic synthetic campaigns' but supplies no external anchoring such as comparison to documented real-world incidents, expert labeling of the synthetics, or distribution-shift metrics between synthetic and authentic attack traces. This assumption is load-bearing for the central claim because the headline metrics (86% structural recall and >92% final precision) are measured exclusively on the self-constructed mixture.
Authors: We agree that stronger validation of the synthetic data would increase confidence in the results. The synthetic campaigns were generated by overlaying documented social engineering tactics (drawn from public reports such as the Verizon DBIR and known phishing/impersonation patterns) onto the real Enron email graph while preserving temporal and structural properties. In the revised manuscript we will add an expanded dataset section that details the exact generation procedure, provides quantitative distribution-shift metrics (e.g., KL divergence on degree, temporal, and linguistic features between synthetic attacks and authentic Enron messages), and cites the specific real-world case studies used to calibrate the synthetics. Direct expert labeling of the full synthetic set or matching to particular confidential incidents is not feasible within academic constraints; however, the added methodological transparency and statistical comparisons will better anchor the evaluation. revision: partial
-
Referee: [Experimental Results] The abstract and results report concrete performance figures (86% recall, >92% precision) yet supply no experimental protocol, baseline comparisons, error bars, or ablation results on the GNN filtering stage versus the BERT refinement stage. Without these controls it is impossible to determine whether the reported gains arise from the proposed pipeline or from dataset construction choices.
Authors: We acknowledge that the presentation of experimental controls was insufficient. The full manuscript already describes the data splits, hyper-parameters, and evaluation protocol in Section 4, but we agree these details should be more prominent and supplemented with additional analyses. In the revision we will insert a dedicated experimental protocol subsection, add baseline comparisons (standalone inductive GNN, standalone ModernBERT, traditional ML classifiers on hand-crafted features), include ablation tables that isolate the contribution of the structural filter versus the content verifier, and report all headline metrics as mean ± standard deviation across five random seeds. These additions will allow readers to attribute performance gains to the proposed multiphase design rather than dataset artifacts. revision: yes
- Direct expert labeling or one-to-one matching of synthetic attacks to specific real-world incidents, which would require access to proprietary or sensitive operational data unavailable for this study.
Circularity Check
No circularity: empirical results on held-out test set with no derivations or self-referential definitions
full rationale
The manuscript presents an applied two-stage detection pipeline (inductive GNN for structural anomalies followed by co-attention ModernBERT for content verification) and reports performance numbers (86% structural recall, >92% final precision) as measured outcomes on a held-out portion of the Enron dataset after augmentation with synthetic campaigns. No equations, parameter-fitting steps that are then re-labeled as predictions, or self-citation chains appear in the derivation chain. The central claims rest on standard empirical evaluation rather than quantities defined in terms of themselves or reductions by construction.
Axiom & Free-Parameter Ledger
free parameters (1)
- GNN and BERT training hyperparameters
axioms (1)
- domain assumption The mixture of real Enron emails and researcher-generated synthetic attack campaigns is statistically representative of live organizational email traffic and real social-engineering attempts.
Reference graph
Works this paper leans on
-
[1]
Abdulrahman A. Alsufyani and Sultan M. Alzahrani. Social engineering attack detection using machine learning: Text phishing attack.Indian Journal of Computer Science and Engineering (IJCSE), 12(3):743–751, 2021
work page 2021
-
[2]
Blondel, Jean-Loup Guillaume, Renaud Lambiotte, and Etienne Lefebvre
Vincent D. Blondel, Jean-Loup Guillaume, Renaud Lambiotte, and Etienne Lefebvre. Fast unfolding of communities in large networks.Journal of Statistical Mechanics: Theory and Experiment, 2008(10):P10008, 2008
work page 2008
-
[3]
Deep anomaly detection on attributed networks
Kaize Ding, Jundong Li, Rohit Bhanushali, and Huan Liu. Deep anomaly detection on attributed networks. InProceedings of the 2019 SIAM International Conference on Data Mining (SDM), pages 594–602, 2019
work page 2019
-
[4]
Hamilton, Rex Ying, and Jure Leskovec
William L. Hamilton, Rex Ying, and Jure Leskovec. Inductive representation learning on large graphs. InAdvances in Neural Information Processing Systems (NeurIPS), 2017
work page 2017
-
[5]
Explainable verbal deception de- tection using transformers.arXiv preprint arXiv:2210.03080, 10 2022
Loukas Ilias, Felix Soldner, and Bennett Kleinberg. Explainable verbal deception de- tection using transformers.arXiv preprint arXiv:2210.03080, 10 2022. 16
-
[6]
A survey on the principles of persuasion as a social engineering strategy in phishing
Kalam Khadka, Abu Barkat Ullah, Wanli Ma, and Elisa Marroquin. A survey on the principles of persuasion as a social engineering strategy in phishing. In2023 IEEE 22nd International Conference on Trust, Security and Privacy in Computing and Communi- cations (TrustCom), pages 1631–1638, Exeter, United Kingdom, 2023. IEEE
work page 2023
-
[7]
Matthew Lansley, Francois Mouton, Stelios Kapetanakis, and Nikolaos Polatidis. SEADer++: Social engineering attack detection in online environments using machine learning.Journal of Information and Telecommunication, 4(3):346–362, 2020
work page 2020
-
[8]
Hierarchical question-image co- attention for visual question answering
Jiasen Lu, Jianwei Yang, Dhruv Batra, and Devi Parikh. Hierarchical question-image co- attention for visual question answering. InAdvances in Neural Information Processing Systems (NeurIPS), volume 29, 2016
work page 2016
-
[9]
Enron email time-series network
Volodymyr Miz, Benoˆ ıt Ricaud, Kirell Benzi, and Pierre Vandergheynst. Enron email time-series network. Zenodo [dataset], 8 2018. [dataset]
work page 2018
-
[10]
Francois Mouton, Mercia M. Malan, Louise Leenen, and H. S. Venter. Social engineering attack framework. In2014 Information Security for South Africa (ISSA), pages 1–9. IEEE, 2014
work page 2014
-
[11]
Carey E. Priebe, John M. Conroy, David J. Marchette, and Youngser Park. Scan statistics on Enron graphs.Computational & Mathematical Organization Theory, 11(3– 4):229–247, 2005
work page 2005
-
[12]
Colin Raffel, Noam Shazeer, Adam Roberts, Katherine Lee, Sharan Narang, Michael Matena, Yanqi Zhou, Wei Li, and Peter J. Liu. Exploring the limits of transfer learn- ing with a unified text-to-text transformer.Journal of Machine Learning Research, 21(140):1–67, 2020
work page 2020
-
[13]
Francophoned: A sophisticated social engineering attack
Symantec Security Response. Francophoned: A sophisticated social engineering attack. Symantec Connect Blog, 1 2014. Accessed: 2026-05-16
work page 2014
-
[14]
Rafa¨ el Van Belle, Charles Van Damme, Hendrik Tytgat, and Jochen De Weerdt. Induc- tive graph representation learning for fraud detection.Expert Systems with Applications, 193:116463, 2022
work page 2022
-
[15]
Benjamin Warner, Antoine Chaffin, Benjamin Clavi´ e, Orion Weller, Oskar Hallstr¨ om, Said Taghadouini, Alexis Gallagher, Raja Biswas, Faisal Ladhak, Tom Aarsen, Nathan Cooper, Griffin Adams, Jeremy Howard, and Iacopo Poli. Smarter, better, faster, longer: A modern bidirectional encoder for fast, memory efficient, and long context finetuning and inference...
work page internal anchor Pith review arXiv 2024
-
[16]
Fast and accurate anomaly detection in dynamic graphs with a two-pronged approach
Minji Yoon, Bryan Hooi, Kijung Shin, and Christos Faloutsos. Fast and accurate anomaly detection in dynamic graphs with a two-pronged approach. InProceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pages 647–657. Association for Computing Machinery, 2019
work page 2019
-
[17]
Malware infection that began with windshield fliers
Lenny Zeltser. Malware infection that began with windshield fliers. Internet Storm Center, 2 2009. Accessed: 2026-05-16. 17
work page 2009
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.