pith. sign in

arxiv: 2605.20032 · v1 · pith:23CFEZRSnew · submitted 2026-05-19 · 💻 cs.LG · cs.MM

CAMERA: Adapting to Semantic Camouflage in Unsupervised Text-Attributed Graph Fraud Detection

Pith reviewed 2026-05-20 06:50 UTC · model grok-4.3

classification 💻 cs.LG cs.MM
keywords text-attributed graphsunsupervised fraud detectionsemantic camouflagemixture of expertsone-class learninggraph neural networksadaptive gating
0
0 comments X

The pith

A mixture-of-experts model with adaptive gating and one-class objectives detects fraudsters who camouflage text in unsupervised graph settings.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes that semantic camouflage, where fraudsters deliberately copy benign users' text, breaks standard assumptions about structural and attribute differences in text-attributed graphs, making unsupervised fraud detection difficult. It introduces CAMERA, which uses an ego-decoupled mixture-of-experts setup so each expert focuses on a distinct cue, a context-informed gate that combines them based on the node and its neighbors, and expert-level one-class objectives that exploit fraud rarity to learn dominant benign patterns. This setup enables separation of camouflaged cases without labels. A sympathetic reader would care because it turns the rarity of fraud into an advantage for reliable detection on platforms where mimicry is common.

Core claim

CAMERA employs an ego-decoupled mixture-of-experts architecture where each expert specializes in modeling a distinct type of fraud-indicative cue, introduces a context-informed gating model to jointly consider the ego node representation and its local neighborhood context for adaptive integration of cues, and leverages the inherent rarity of fraudsters to support unsupervised one-class learning with expert-level objectives that encourage modeling dominant benign patterns, thereby enabling reliable unsupervised detection of camouflaged fraudsters.

What carries the argument

The ego-decoupled mixture-of-experts architecture with context-informed gating and expert-level one-class objectives, which lets separate experts handle distinct cues while the gate adapts their combination to local context and the one-class step isolates rare deviations.

If this is right

  • It allows unsupervised TAGFD to remain effective even when camouflage undermines usual structural and attribute signals.
  • Expert specialization and adaptive gating together support cue integration without any labeled fraud examples.
  • One-class objectives at the expert level turn the scarcity of fraud into a stable modeling target for benign behavior.
  • Performance gains appear consistently across four challenging datasets against prior unsupervised methods.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Similar expert specialization and context gating could apply to other anomaly tasks where adversaries evolve to mimic normal behavior, such as spam or bot detection.
  • The approach suggests testing whether adding more cue types or dynamic expert addition improves handling of new camouflage tactics over time.
  • If the rarity assumption weakens, hybrid semi-supervised extensions that incorporate a small number of confirmed cases might maintain separation.
  • This framing highlights a broader pattern in graph learning where modeling what is normal can be more robust than chasing shifting anomalies directly.

Load-bearing premise

Fraudsters remain rare enough that their presence does not disrupt the ability of one-class objectives to reliably capture the dominant benign patterns across experts.

What would settle it

A dataset in which fraudsters become common enough to alter overall patterns or achieve perfect semantic mimicry of benign text would cause detection performance to fall to levels indistinguishable from random guessing.

Figures

Figures reproduced from arXiv: 2605.20032 by Alan Wee-Chung Liew, Junjun Pan, Lianhua Chi, Shirui Pan, Yixin Liu, Yu Zheng.

Figure 1
Figure 1. Figure 1: Illustration of fraudster evolution in text-attributed graphs, [PITH_FULL_IMAGE:figures/full_fig_p001_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Left: Local affinity of different datasets with and without [PITH_FULL_IMAGE:figures/full_fig_p002_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Overall framework of CAMERA. 4 Methodology In this section, we provide an overview of CAMERA. As shown in [PITH_FULL_IMAGE:figures/full_fig_p003_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Ablation on (L): #experts and (R): gating mechanism. [PITH_FULL_IMAGE:figures/full_fig_p007_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Visualization of expert weight. Visualization of Expert Allocation To investigate how CAMERA adaptively integrates fraud￾indicative cues across different scenarios, we visualize the gating weights at the dataset and case levels. Dataset-level visualization is shown in [PITH_FULL_IMAGE:figures/full_fig_p007_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Case study on YelpChi. 6 Conclusion In this paper, we propose a novel unsupervised TAGFD framework, CAMERA, that enables detection against evolved fraudsters that engage in semantic camouflage to mimic benign behaviors and evade detection. By utilizing ego￾decoupled MoE architecture together with a one-class unsu￾pervised training objective, CAMERA can adaptively inte￾grate different fraud-indicative cues … view at source ↗
Figure 7
Figure 7. Figure 7: Over-parameterization study on YelpChi and Amazon [PITH_FULL_IMAGE:figures/full_fig_p012_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: Grid search results for α vs. β (surface height indicates AUROC). layers. Overall, CAMERA scales linearly with the number of nodes and edges, making it suitable for real-world large-scale TAGFD scenarios. E Dataset and Implementation Details Detailed statistics of the datasets are summarized in Ta￾ble 4. All experiments are conducted on a Windows desk￾top equipped with 32 GB RAM and an RTX 4090 GPU with 24… view at source ↗
read the original abstract

Text-attributed graph fraud detection (TAGFD) plays a critical role in preventing fraudulent activities on online social and e-commerce platforms. However, to evade detection, fraudsters continuously evolve their camouflaging strategies by deliberately mimicking textual responses of benign users, thereby concealing their malicious purposes. This phenomenon, referred to as semantic camouflage, fundamentally undermines commonly relied assumptions on how structural and attribute cues can be exploited to identify fraudsters, and makes it difficult to spot fraudsters with unsupervised TAGFD. To bridge the gaps, we propose a Case-Adaptive Multi-cue Expert fRAmework (CAMERA) for unsupervised TAGFD. CAMERA employs an ego-decoupled mixture-of-experts architecture, where each expert specializes in modeling a distinct type of fraud-indicative cue. A context-informed gating model is introduced to jointly consider the ego node representation and its local neighborhood context for adaptive integration of cues learned by different experts. Furthermore, CAMERA leverages the inherent rarity of fraudsters to support unsupervised one-class learning with expert-level objectives that encourage modeling dominant benign patterns, thereby enabling reliable unsupervised detection of camouflaged fraudsters. Experiments on 4 challenging datasets show that CAMERA consistently outperforms competitors, showing its effectiveness against semantically camouflaged fraudsters. Code available at https://github.com/CampanulaBells/CAMERA

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper proposes CAMERA, a Case-Adaptive Multi-cue Expert fRAmework for unsupervised text-attributed graph fraud detection (TAGFD) to address semantic camouflage where fraudsters mimic benign textual responses. It features an ego-decoupled mixture-of-experts architecture with each expert specializing in distinct fraud-indicative cues, a context-informed gating model that considers ego node and neighborhood context, and expert-level one-class objectives that leverage the inherent rarity of fraudsters to model dominant benign patterns. Experiments on 4 challenging datasets report that CAMERA consistently outperforms competitors.

Significance. If the results hold, this work is significant for tackling an important and evolving challenge in fraud detection on social and e-commerce platforms. The mixture-of-experts design with adaptive cue integration and unsupervised one-class learning offers a novel direction for handling semantic camouflage. The public release of code supports reproducibility and is a clear strength.

major comments (2)
  1. [§3.3] §3.3, expert-level one-class objectives: The central unsupervised separation claim depends on the assumption that fraudster rarity allows each expert to reliably model only dominant benign patterns. No analysis is provided on the case where semantic camouflage aligns fraudster text attributes with benign distributions (potentially across multiple cues), which could cause the mixture to model a blended distribution rather than isolating the rare class.
  2. [§4] §4, experimental results: The claim of consistent outperformance on 4 datasets is load-bearing for the paper's contribution, yet the manuscript lacks details on exact metrics (e.g., AUC or F1), baseline configurations, statistical significance testing, and ablation studies on the number of experts or gating network weights. This makes it difficult to assess robustness against semantic camouflage.
minor comments (2)
  1. [§3.1] The notation for the gating network and expert outputs could be clarified with a single consolidated diagram or table to improve readability of the architecture.
  2. [§1] A few sentences in the introduction repeat the problem motivation without adding new technical context; tightening would improve flow.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the detailed and constructive feedback on our manuscript. We address each major comment point by point below, providing clarifications and indicating revisions made to strengthen the presentation of our contributions.

read point-by-point responses
  1. Referee: [§3.3] §3.3, expert-level one-class objectives: The central unsupervised separation claim depends on the assumption that fraudster rarity allows each expert to reliably model only dominant benign patterns. No analysis is provided on the case where semantic camouflage aligns fraudster text attributes with benign distributions (potentially across multiple cues), which could cause the mixture to model a blended distribution rather than isolating the rare class.

    Authors: We acknowledge that the effectiveness of the expert-level one-class objectives relies on the rarity assumption and that extreme alignment of camouflaged fraudster attributes with benign distributions across cues could in principle lead to modeling challenges. The design of CAMERA mitigates this through ego-decoupled experts that specialize on distinct fraud-indicative cues and a context-informed gating mechanism that adaptively weights them based on both ego and neighborhood information. In the revised manuscript, we have expanded Section 3.3 with additional discussion of this edge case and included new empirical analysis on synthetically modified datasets that increase cross-cue alignment to illustrate that performance degradation remains limited compared to baselines. revision: yes

  2. Referee: [§4] §4, experimental results: The claim of consistent outperformance on 4 datasets is load-bearing for the paper's contribution, yet the manuscript lacks details on exact metrics (e.g., AUC or F1), baseline configurations, statistical significance testing, and ablation studies on the number of experts or gating network weights. This makes it difficult to assess robustness against semantic camouflage.

    Authors: We agree that greater detail on the experimental setup and results is necessary for full assessment of robustness. The revised Section 4 now reports precise AUC and F1 values for CAMERA and all baselines, provides complete hyperparameter configurations and implementation details for each baseline, includes statistical significance testing via paired t-tests with reported p-values, and adds ablation studies examining performance as a function of the number of experts (ranging from 2 to 8) as well as variations in gating network architecture and loss weighting. These updates directly support the robustness claims under semantic camouflage. revision: yes

Circularity Check

0 steps flagged

No circularity: new architecture and objectives are independently defined

full rationale

The paper introduces an ego-decoupled mixture-of-experts architecture, context-informed gating, and expert-level one-class objectives that explicitly leverage the external rarity assumption for unsupervised modeling of benign patterns. No equations, fitted parameters, or self-citations are shown that reduce the central claims or performance metrics to quantities defined from the same evaluation data. The derivation chain consists of architectural proposals and modeling choices that remain self-contained against external benchmarks and do not collapse by construction.

Axiom & Free-Parameter Ledger

2 free parameters · 1 axioms · 0 invented entities

The approach rests on standard unsupervised learning assumptions plus the domain premise that fraud is rare; introduces no new physical entities but relies on learned expert specialization.

free parameters (2)
  • number of experts
    Hyperparameter controlling how many distinct fraud-indicative cues are modeled separately.
  • gating network weights
    Learned parameters that integrate expert outputs based on ego and neighborhood context.
axioms (1)
  • domain assumption Fraudsters are rare relative to benign users and their patterns deviate from dominant benign behavior.
    Invoked to justify unsupervised one-class learning with expert-level objectives.

pith-pipeline@v0.9.0 · 5782 in / 1145 out tokens · 39311 ms · 2026-05-20T06:50:50.640084+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

44 extracted references · 44 canonical work pages · 2 internal anchors

  1. [1]

    Out-of- distribution detection on graphs: A survey.arXiv preprint arXiv:2502.08105,

    [Caiet al., 2025 ] Tingyi Cai, Yunliang Jiang, Yixin Liu, Ming Li, Changqin Huang, and Shirui Pan. Out-of- distribution detection on graphs: A survey.arXiv preprint arXiv:2502.08105,

  2. [2]

    Approximation by su- perpositions of a sigmoidal function.Mathematics of con- trol, signals and systems, 2(4):303–314,

    [Cybenko, 1989] George Cybenko. Approximation by su- perpositions of a sigmoidal function.Mathematics of con- trol, signals and systems, 2(4):303–314,

  3. [3]

    Deep anomaly detection on attributed networks

    [Dinget al., 2019 ] Kaize Ding, Jundong Li, Rohit Bhanushali, and Huan Liu. Deep anomaly detection on attributed networks. InProceedings of the 2019 SIAM international conference on data mining, pages 594–602. SIAM,

  4. [4]

    SpaceGNN: Multi-space graph neural network for node anomaly detection with ex- tremely limited labels

    [Donget al., 2025 ] Xiangyu Dong, Xingyi Zhang, Lei Chen, Mingxuan Yuan, and Sibo Wang. SpaceGNN: Multi-space graph neural network for node anomaly detection with ex- tremely limited labels. InThe Thirteenth International Conference on Learning Representations,

  5. [5]

    Enhancing graph neural network-based fraud detectors against camouflaged fraudsters

    [Douet al., 2020 ] Yingtong Dou, Zhiwei Liu, Li Sun, Yu- tong Deng, Hao Peng, and Philip S Yu. Enhancing graph neural network-based fraud detectors against camouflaged fraudsters. InProceedings of the 29th ACM interna- tional conference on information & knowledge manage- ment, pages 315–324,

  6. [6]

    Switch transformers: Scaling to trillion param- eter models with simple and efficient sparsity.Journal of Machine Learning Research, 23(120):1–39,

    [Feduset al., 2022 ] William Fedus, Barret Zoph, and Noam Shazeer. Switch transformers: Scaling to trillion param- eter models with simple and efficient sparsity.Journal of Machine Learning Research, 23(120):1–39,

  7. [7]

    Ad- dressing heterophily in graph anomaly detection: A per- spective of graph spectrum

    [Gaoet al., 2023 ] Yuan Gao, Xiang Wang, Xiangnan He, Zhenguang Liu, Huamin Feng, and Yongdong Zhang. Ad- dressing heterophily in graph anomaly detection: A per- spective of graph spectrum. InProceedings of the ACM web conference 2023, pages 1528–1538,

  8. [8]

    Demix layers: Disentangling domains for modular lan- guage modeling

    [Gururanganet al., 2022 ] Suchin Gururangan, Mike Lewis, Ari Holtzman, Noah A Smith, and Luke Zettlemoyer. Demix layers: Disentangling domains for modular lan- guage modeling. InProceedings of the 2022 Conference of the North American Chapter of the Association for Com- putational Linguistics: Human Language Technologies, pages 5557–5576,

  9. [9]

    Ada-gad: Anomaly-denoised autoencoders for graph anomaly detec- tion

    [Heet al., 2024 ] Junwei He, Qianqian Xu, Yangbangyan Jiang, Zitai Wang, and Qingming Huang. Ada-gad: Anomaly-denoised autoencoders for graph anomaly detec- tion. InProceedings of the AAAI Conference on Artificial Intelligence, volume 38, pages 8481–8489,

  10. [10]

    Cost-sensitive gnn-based imbal- anced learning for mobile social network fraud detec- tion.IEEE Transactions on Computational Social Systems, 11(2):2675–2690,

    [Huet al., 2023 ] Xinxin Hu, Haotian Chen, Hongchang Chen, Shuxin Liu, Xing Li, Shibo Zhang, Yahui Wang, and Xiangyang Xue. Cost-sensitive gnn-based imbal- anced learning for mobile social network fraud detec- tion.IEEE Transactions on Computational Social Systems, 11(2):2675–2690,

  11. [11]

    Unsupervised graph outlier de- tection: Problem revisit, new insight, and superior method

    [Huanget al., 2023 ] Yihong Huang, Liping Wang, Fan Zhang, and Xuemin Lin. Unsupervised graph outlier de- tection: Problem revisit, new insight, and superior method. In2023 IEEE 39th International Conference on Data En- gineering (ICDE), pages 2565–2578. IEEE,

  12. [12]

    Anemone: Graph anomaly detection with multi-scale contrastive learning

    [Jinet al., 2021 ] Ming Jin, Yixin Liu, Yu Zheng, Lianhua Chi, Yuan-Fang Li, and Shirui Pan. Anemone: Graph anomaly detection with multi-scale contrastive learning. In Proceedings of the 30th ACM international conference on information & knowledge management, pages 3122–3126,

  13. [13]

    Auto-Encoding Variational Bayes

    [Kingma and Welling, 2013] Diederik P Kingma and Max Welling. Auto-encoding variational bayes.arXiv preprint arXiv:1312.6114,

  14. [14]

    Base lay- ers: Simplifying training of large, sparse models

    [Lewiset al., 2021 ] Mike Lewis, Shruti Bhosale, Tim Dettmers, Naman Goyal, and Luke Zettlemoyer. Base lay- ers: Simplifying training of large, sparse models. InInter- national Conference on Machine Learning, pages 6265–

  15. [15]

    Internet financial fraud de- tection based on graph learning.IEEE Transactions on Computational Social Systems, 10(3):1394–1401,

    [Liet al., 2022 ] Ranran Li, Zhaowei Liu, Yuanqing Ma, Dong Yang, and Shuaijie Sun. Internet financial fraud de- tection based on graph learning.IEEE Transactions on Computational Social Systems, 10(3):1394–1401,

  16. [16]

    Glbench: A comprehensive benchmark for graph with large language models.Advances in Neural Information Processing Systems, 37:42349–42368,

    [Liet al., 2024 ] Yuhan Li, Peisong Wang, Xiao Zhu, Aochuan Chen, Haiyun Jiang, Deng Cai, Victor W Chan, and Jia Li. Glbench: A comprehensive benchmark for graph with large language models.Advances in Neural Information Processing Systems, 37:42349–42368,

  17. [17]

    Dgp: A dual-granularity prompting framework for fraud detection with graph-enhanced llms

    [Liet al., 2025 ] Yuan Li, Jun Hu, Bryan Hooi, Bingsheng He, and Cheng Chen. Dgp: A dual-granularity prompting framework for fraud detection with graph-enhanced llms. arXiv preprint arXiv:2507.21653,

  18. [18]

    Towards one-for- all anomaly detection for tabular data

    [Liet al., 2026 ] Shiyuan Li, Yixin Liu, Yu Zheng, Xiaofeng Cao, Shirui Pan, and Heng Tao Shen. Towards one-for- all anomaly detection for tabular data. InInternational Conference on Machine Learning (ICML),

  19. [19]

    Pick and choose: a gnn-based imbalanced learning approach for fraud detection

    [Liuet al., 2021a ] Yang Liu, Xiang Ao, Zidi Qin, Jianfeng Chi, Jinghua Feng, Hao Yang, and Qing He. Pick and choose: a gnn-based imbalanced learning approach for fraud detection. InProceedings of the web conference 2021, pages 3168–3177,

  20. [20]

    Diver- sifying the mixture-of-experts representation for language models with orthogonal optimizer

    [Liuet al., 2023 ] Boan Liu, Liang Ding, Li Shen, Keqin Peng, Yu Cao, Dazhao Cheng, and Dacheng Tao. Diver- sifying the mixture-of-experts representation for language models with orthogonal optimizer. InProceedings of the European Conference on Artificial Intelligence (ECAI). IOS Press,

  21. [21]

    To- wards anomaly detection on text-attributed graphs

    [Liuet al., 2025 ] Xudong Liu, Yanan Ren, Hengtong Zhang, Run-An Wang, Shenghe Zheng, and Zhaonian Zou. To- wards anomaly detection on text-attributed graphs. https: //openreview.net/forum?id=LMKYd9JHgU,

  22. [22]

    [Liuet al., 2026 ] Yixin Liu, Shiyuan Li, Yu Zheng, Qingfeng Chen, Chengqi Zhang, Philip S Yu, and Shirui Pan

    Open- Review preprint. [Liuet al., 2026 ] Yixin Liu, Shiyuan Li, Yu Zheng, Qingfeng Chen, Chengqi Zhang, Philip S Yu, and Shirui Pan. From few-shot to zero-shot: Towards generalist graph anomaly detection.arXiv preprint arXiv:2602.18793,

  23. [23]

    A comprehensive survey on graph anomaly detection with deep learning.IEEE transactions on knowl- edge and data engineering, 35(12):12012–12038,

    [Maet al., 2021 ] Xiaoxiao Ma, Jia Wu, Shan Xue, Jian Yang, Chuan Zhou, Quan Z Sheng, Hui Xiong, and Le- man Akoglu. A comprehensive survey on graph anomaly detection with deep learning.IEEE transactions on knowl- edge and data engineering, 35(12):12012–12038,

  24. [24]

    On fake news detec- tion with llm enhanced semantics mining

    [Maet al., 2024 ] Xiaoxiao Ma, Yuchen Zhang, Kaize Ding, Jian Yang, Jia Wu, and Hao Fan. On fake news detec- tion with llm enhanced semantics mining. InProceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, pages 508–521,

  25. [25]

    From amateurs to connoisseurs: model- ing the evolution of user expertise through online reviews

    [McAuley and Leskovec, 2013] Julian John McAuley and Jure Leskovec. From amateurs to connoisseurs: model- ing the evolution of user expertise through online reviews. InProceedings of the 22nd international conference on World Wide Web, pages 897–908,

  26. [26]

    Prem: A simple yet effective approach for node-level graph anomaly detection

    [Panet al., 2023 ] Junjun Pan, Yixin Liu, Yizhen Zheng, and Shirui Pan. Prem: A simple yet effective approach for node-level graph anomaly detection. In2023 IEEE In- ternational Conference on Data Mining (ICDM), pages 1253–1258. IEEE,

  27. [27]

    Guard: Effective anomaly detection through a text-rich and graph- informed language model

    [Panget al., 2025 ] Yunhe Pang, Bo Chen, Fanjin Zhang, Yanghui Rao, Evgeny Kharlamov, and Jie Tang. Guard: Effective anomaly detection through a text-rich and graph- informed language model. InProceedings of the 31st ACM SIGKDD Conference on Knowledge Discovery and Data Mining V . 2, pages 2222–2233,

  28. [28]

    Dynhd: Hallucination detection for diffusion large language models via denoising dynam- ics deviation learning.arXiv preprint arXiv:2603.16459,

    [Qianet al., 2026 ] Yanyu Qian, Yue Tan, Yixin Liu, Wang Yu, and Shirui Pan. Dynhd: Hallucination detection for diffusion large language models via denoising dynam- ics deviation learning.arXiv preprint arXiv:2603.16459,

  29. [29]

    Truncated affinity maximization: One-class homophily modeling for graph anomaly detection.Advances in Neural Information Processing Systems, 36:49490–49512,

    [Qiao and Pang, 2023] Hezhe Qiao and Guansong Pang. Truncated affinity maximization: One-class homophily modeling for graph anomaly detection.Advances in Neural Information Processing Systems, 36:49490–49512,

  30. [30]

    Collective opinion spam detection: Bridging re- view networks and metadata

    [Rayana and Akoglu, 2015] Shebuti Rayana and Leman Akoglu. Collective opinion spam detection: Bridging re- view networks and metadata. InProceedings of the 21th acm sigkdd international conference on knowledge discov- ery and data mining, pages 985–994,

  31. [31]

    Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks

    [Reimers and Gurevych, 2019] Nils Reimers and Iryna Gurevych. Sentence-bert: Sentence embeddings using siamese bert-networks.arXiv preprint arXiv:1908.10084,

  32. [32]

    Gad-nr: Graph anomaly detection via neighborhood reconstruc- tion

    [Royet al., 2024 ] Amit Roy, Juan Shu, Jia Li, Carl Yang, Olivier Elshocht, Jeroen Smeets, and Pan Li. Gad-nr: Graph anomaly detection via neighborhood reconstruc- tion. InProceedings of the 17th ACM international con- ference on web search and data mining, pages 576–585,

  33. [33]

    A sur- vey on oversmoothing in graph neural networks,

    [Ruschet al., 2023 ] T Konstantin Rusch, Michael M Bron- stein, and Siddhartha Mishra. A survey on over- smoothing in graph neural networks.arXiv preprint arXiv:2303.10993,

  34. [34]

    Bisecle: Binding and separa- tion in continual learning for video language understand- ing.Advances in Neural Information Processing Systems, 38:33752–33782,

    [Tanet al., 2025 ] Yue Tan, Xiaoqian Hu, Hao Xue, Celso De Melo, and Flora Salim. Bisecle: Binding and separa- tion in continual learning for video language understand- ing.Advances in Neural Information Processing Systems, 38:33752–33782,

  35. [35]

    Rethinking graph neural networks for anomaly de- tection

    [Tanget al., 2022 ] Jianheng Tang, Jiajin Li, Ziqi Gao, and Jia Li. Rethinking graph neural networks for anomaly de- tection. InInternational conference on machine learning, pages 21076–21089. PMLR,

  36. [36]

    One-class graph neural networks for anomaly detection in at- tributed networks.Neural computing and applications, 33(18):12073–12085,

    [Wanget al., 2021 ] Xuhong Wang, Baihong Jin, Ying Du, Ping Cui, Yingshui Tan, and Yupu Yang. One-class graph neural networks for anomaly detection in at- tributed networks.Neural computing and applications, 33(18):12073–12085,

  37. [37]

    Text-attributed graph anomaly detection via multi-scale cross-and uni-modal contrastive learning.arXiv preprint arXiv:2508.00513,

    [Xuet al., 2025b ] Yiming Xu, Xu Hua, Zhen Peng, Bin Shi, Jiarun Chen, Xingbo Fu, Song Wang, and Bo Dong. Text-attributed graph anomaly detection via multi-scale cross-and uni-modal contrastive learning.arXiv preprint arXiv:2508.00513,

  38. [38]

    Grad: Guided relation diffusion generation for graph augmentation in graph fraud detection

    [Yanget al., 2025b ] Jie Yang, Rui Zhang, Ziyang Cheng, Dawei Cheng, Guang Yang, and Bo Wang. Grad: Guided relation diffusion generation for graph augmentation in graph fraud detection. InProceedings of the ACM on Web Conference 2025, pages 5308–5319,

  39. [39]

    Group-based fraud detection network on e- commerce platforms

    [Yuet al., 2023 ] Jianke Yu, Hanchen Wang, Xiaoyang Wang, Zhao Li, Lu Qin, Wenjie Zhang, Jian Liao, and Ying Zhang. Group-based fraud detection network on e- commerce platforms. InProceedings of the 29th ACM SIGKDD conference on knowledge discovery and data mining, pages 5463–5475,

  40. [40]

    Freegad: A training-free yet effective approach for graph anomaly de- tection

    [Zhaoet al., 2025 ] Yunfeng Zhao, Yixin Liu, Shiyuan Li, Qingfeng Chen, Yu Zheng, and Shirui Pan. Freegad: A training-free yet effective approach for graph anomaly de- tection. InProceedings of the 34th ACM International Conference on Information and Knowledge Management, pages 4379–4389,

  41. [41]

    Fedcigar: A personalized reconstruction approach for federated graph- level anomaly detection

    [Zhaoet al., 2026 ] Yunfeng Zhao, Yixin Liu, Qingfeng Chen, Shiyuan Li, Yue Tan, and Shirui Pan. Fedcigar: A personalized reconstruction approach for federated graph- level anomaly detection. InInternational Joint Conference on Artificial Intelligence,

  42. [42]

    Earlier works treat GFD as a class-imbalance classification problem and incorporate tech- niques such as sampling [Liuet al., 2021a ]

    A Related work in details A.1 Fraud Detection on Attributed Graph Graph fraud detection (GFD) aims to identify fraudulent ac- tivities in real-world graph applications, such as financial fraud [Liet al., 2022 ], fake reviews [Yuet al., 2023 ], and spamming [Huet al., 2023 ]. Earlier works treat GFD as a class-imbalance classification problem and incorpora...

  43. [43]

    minL OC ⇐ ⇒min X i X k g[L] i,k e[L] k (h[L−1] i ,A) 2

    From this, it follows that minimizingL OC is equivalent to minimizing theℓ 2 norm of the gating-weighted expert resid- uals: 24 25 26 27 28 29 210 211 212 Hidden Dimension 0.55 0.60AUROC (%) AmazonVideo YelpChi Figure 7: Over-parameterization study on YelpChi and Amazon datasets. minL OC ⇐ ⇒min X i X k g[L] i,k e[L] k (h[L−1] i ,A) 2 . Therefore, to analy...

  44. [44]

    16:end for 17://Inference phase 18:H [L] ←f(H [0],A) 19:Computes, wheres i =σ(∥h L i ∥2) 20:returns et al., 2024 ] in GFD, thereby reducing performance

    ←LLM(T) 3://Training phase 4:forepoch= 1, ..., Edo 5:L expert ←0 6:L gating ←0 7:forl= 1, ..., Ldo 8:H [l] ←f [l](H[l−1],A) 9:L expert ← Lexpert +P k 1 N PN i=1 ∥e[l] k (h[l−1] i ,A)∥ 2 2 10:L gating ← Lgating +P k − 1 N PN i=1 g[l] i,k log(g[l] i,k +ϵ) 11:end for 12:Computes, wheres i =σ(∥h L i ∥2) 13:L OC = 1 N PN i=1 BCE(si,0) 14:L=L expert +αL gating ...