pith. machine review for the scientific record. sign in

arxiv: 2604.14457 · v1 · submitted 2026-04-15 · 💻 cs.CR

Recognition: unknown

NeuroTrace: Inference Provenance-Based Detection of Adversarial Examples

Authors on Pith no claims yet

Pith reviewed 2026-05-10 12:26 UTC · model grok-4.3

classification 💻 cs.CR
keywords adversarial detectioninference provenanceprovenance graphsgraph neural networksdeep neural networksmachine learning securitycross-layer analysisruntime auditing
0
0 comments X

The pith

Inference provenance graphs capture cross-layer dataflow to distinguish adversarial inputs from benign ones.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces a method to build graphs that record how activations and parameters interact during a neural network's forward pass. These graphs supply structured cross-layer signals that existing layer-by-layer detectors miss. When used for classification, the signals separate adversarial examples from normal inputs with high accuracy in both single-attack and mixed-attack scenarios. The approach also works across vision and malware domains and beats earlier graph-based detectors. If the signal proves general, it offers a practical route to runtime auditing of model behavior without changing the underlying network.

Core claim

NeuroTrace extracts Inference Provenance Graphs from instrumented model executions; each graph encodes activation values together with the parameter-driven dataflow paths that produced them. Detectors built on these graphs reliably flag adversarial examples under intra-attack, multi-attack, and cross-domain transfer conditions while improving on prior graph baselines. The framework supplies a reusable extraction engine, a standardized graph format, and a public benchmark spanning multiple attack families.

What carries the argument

Inference Provenance Graphs (IPGs), heterogeneous graphs that record both activation behavior and parameter-induced dataflow during the forward pass.

If this is right

  • IPG detectors maintain high performance when trained on one attack family and tested on others.
  • The same graphs improve detection accuracy over earlier graph-based methods in both vision and malware tasks.
  • Runtime and storage costs of provenance extraction can be measured and traded off against detection quality.
  • Releasing the extraction pipeline and dataset enables repeatable study of inference-time information flow.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Provenance tracking could be combined with existing monitoring tools to create layered defenses that audit both inputs and internal execution.
  • If IPGs generalize beyond adversarial examples, similar graphs might flag other runtime anomalies such as model poisoning or distribution shift.
  • The open dataset allows direct comparison of provenance signals against activation-only or attribution-only baselines on the same inputs.

Load-bearing premise

That the cross-layer patterns captured in the graphs remain informative enough to separate adversarial from benign inputs even when the attack type or application domain changes.

What would settle it

A new attack family or domain where an IPG-based detector achieves no better than chance accuracy on held-out adversarial examples while layer-local detectors still succeed.

Figures

Figures reproduced from arXiv: 2604.14457 by Birhanu Eshete, Firas Ben Hmida, Kashif Ali Khan, Philemon Hailemariam.

Figure 1
Figure 1. Figure 1: NeuroTrace pipeline. Given a trained model and input, the framework extracts an input-dependent Inference Provenance Graph (IPG), represents it as a heterogeneous graph, and applies a graph classifier for downstream tasks such as adversarial detection. 4 IPG Dataset Organization Overview. We release a dataset of IPGs spanning vision and mal￾ware domains. Each graph corresponds to a single inference event. … view at source ↗
Figure 2
Figure 2. Figure 2: ROC curves under intra-attack evaluation. The de [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗
Figure 4
Figure 4. Figure 4: Training and validation loss for the multi-attack [PITH_FULL_IMAGE:figures/full_fig_p005_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Training and validation accuracy for the multi [PITH_FULL_IMAGE:figures/full_fig_p005_5.png] view at source ↗
read the original abstract

Deep neural networks (DNNs) remain largely opaque at inference time, limiting our ability to detect and diagnose malicious input manipulations such as adversarial examples. Existing detection methods predominantly rely on layer-local signals (e.g., activations or attribution scores), leaving cross-layer information flow and execution structure under-explored. We introduce NeuroTrace, a framework and open dataset for analyzing inference provenance through Inference Provenance Graphs (IPGs). IPGs are heterogeneous graphs that capture both activation behavior and parameter-induced dataflow during a model's forward pass, providing a structured representation of how information propagates through the network. NeuroTrace includes (i) a reproducible extraction engine that instruments model execution, (ii) a standardized graph representation compatible with heterogeneous GNNs, and (iii) a benchmark suite spanning multiple adversarial attack families across vision and malware domains. Using this framework, we evaluate IPG-based detectors for adversarial example detection under intra-attack, multi-attack, and cross-threat transfer settings. Our results show that inference provenance provides a strong and transferable signal for distinguishing adversarial and benign inputs, achieving consistently high detection performance and improving over prior graph-based baselines. We further analyze the conditions under which provenance-based detection generalizes across attack types, as well as the associated runtime and storage trade-offs. By releasing the dataset, extraction pipeline, and evaluation protocol, NeuroTrace enables systematic study of inference-time behavior and establishes inference provenance as a practical foundation for building more transparent and auditable machine learning systems.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

0 major / 1 minor

Summary. The paper introduces NeuroTrace, a framework for detecting adversarial examples via Inference Provenance Graphs (IPGs). IPGs are heterogeneous graphs that encode both activation behavior and parameter-induced dataflow during DNN forward passes. The work supplies a reproducible extraction engine, a standardized graph format for heterogeneous GNNs, and a benchmark suite spanning vision and malware domains. Evaluations are performed under intra-attack, multi-attack, and cross-threat transfer regimes, with the central claim that provenance yields a strong, transferable detection signal that improves over prior graph-based baselines. The dataset, pipeline, and protocol are released openly.

Significance. If the benchmark results hold, the contribution is significant because it moves adversarial detection beyond layer-local signals to structured cross-layer provenance information. The open release of the dataset, extraction pipeline, and evaluation protocol is a clear strength that supports reproducibility and systematic follow-on work in inference-time ML analysis. This positions inference provenance as a practical foundation for more transparent and auditable machine-learning systems.

minor comments (1)
  1. [Abstract] Abstract: the summary asserts 'consistently high detection performance' and improvement over baselines without any numerical values, dataset sizes, or error bars. Adding one or two headline metrics would allow readers to assess the strength of the claim immediately.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for their positive assessment of NeuroTrace, the open release of the dataset and pipeline, and the recommendation for minor revision. No specific major comments were raised in the report.

Circularity Check

0 steps flagged

No significant circularity

full rationale

The paper describes an empirical framework for building Inference Provenance Graphs (IPGs) from instrumented DNN forward passes and training heterogeneous GNNs for binary classification of adversarial vs. benign inputs. No equations, derivations, or first-principles claims appear in the provided text; the central results rest on released datasets, extraction pipelines, and benchmark evaluations across attack families and domains. The methodology is self-contained against external benchmarks with no self-definitional reductions, fitted inputs renamed as predictions, or load-bearing self-citations that collapse the claimed signal to its own inputs.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 1 invented entities

Abstract-only review provides no equations, hyperparameters, or detailed methods; therefore no free parameters, axioms, or invented entities can be identified beyond the high-level introduction of IPGs.

invented entities (1)
  • Inference Provenance Graphs (IPGs) no independent evidence
    purpose: Heterogeneous graph representation capturing activation behavior and parameter-induced dataflow in DNN forward passes
    New structured representation introduced to enable provenance analysis; abstract provides no independent falsifiable evidence outside the framework itself.

pith-pipeline@v0.9.0 · 5572 in / 1183 out tokens · 39164 ms · 2026-05-10T12:26:57.922910+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

17 extracted references · 9 canonical work pages · 1 internal anchor

  1. [1]

    Abderrahmen Amich and Birhanu Eshete. 2021. Explanation-Guided Diagnosis of Machine Learning Evasion Attacks. InSecurity and Privacy in Communication Networks, Joaquin Garcia-Alfaro, Shujun Li, Radha Poovendran, Hervé Debar, and Moti Yung (Eds.). Springer International Publishing, Cham, 207–228

  2. [2]

    H. S. Anderson and P. Roth. 2018. EMBER: An Open Dataset for Training Static PE Malware Machine Learning Models.ArXiv e-prints(2018). arXiv:1804.04637

  3. [3]

    Maksym Andriushchenko, Francesco Croce, Nicolas Flammarion, and Matthias Hein. 2020. Square Attack: a query-efficient black-box adversarial attack via random search. arXiv:1912.00049 [cs.LG]

  4. [4]

    Alexander Binder, Grégoire Montavon, Sebastian Lapuschkin, Klaus-Robert Müller, and Wojciech Samek. 2016. Layer-Wise Relevance Propagation for Neural Networks with Local Renormalization Layers. InArtificial Neural Networks and Machine Learning - ICANN 2016 - 25th International Conference on Artificial Neu- ral Networks, Barcelona, Spain, September 6-9, 20...

  5. [5]

    Francesco Croce and Matthias Hein. 2020. Reliable evaluation of ad- versarial robustness with an ensemble of diverse parameter-free attacks. arXiv:2003.01690 [cs.LG]

  6. [6]

    Goodfellow, Jonathon Shlens, and Christian Szegedy

    Ian J. Goodfellow, Jonathon Shlens, and Christian Szegedy. 2015. Explaining and Harnessing Adversarial Examples. In3rd International Conference on Learning Representations, ICLR

  7. [7]

    Firas Ben Hmida, Abderrahmen Amich, Ata Kaboudi, and Birhanu Eshete. 2025. DeepProv: Behavioral Characterization and Repair of Neural Networks via Infer- ence Provenance Graph Analysis. InIEEE Annual Computer Security Applications Conference, ACSAC 2025, Honolulu, HI, USA, December 8-12, 2025. IEEE, 922–938. doi:10.1109/ACSAC67867.2025.00077

  8. [8]

    Kherchouche, S

    A. Kherchouche, S. A. Fezza, W. Hamidouche, and O. Deforges. 2020. Detection of Adversarial Examples in Deep Neural Networks with Natural Scene Statistics. In2020 International Joint Conference on Neural Networks (IJCNN). 1–7

  9. [9]

    Alex Krizhevsky, Vinod Nair, and Geoffrey Hinton. 2009. CIFAR-10 (Canadian Institute for Advanced Research). (2009). http://www.cs.toronto.edu/~kriz/cifar. html

  10. [10]

    S. Ma, Y. Liu, G. Tao, W.-C. Lee, and X. Zhang. 2019. NIC: Detecting Adversarial Samples with Neural Network Invariant Checking. InProceedings 2019 Network and Distributed System Security Symposium

  11. [11]

    Xingjun Ma, Bo Li, Y. Wang, S. M. Erfani, S. N. R. Wijewickrema, M. E. Houle, G. R. Schoenebeck, D. X. Song, and J. Bailey. 2018. Characterizing Adversarial Sub- spaces Using Local Intrinsic Dimensionality. arXiv:1801.02613 arXiv:1801.02613

  12. [12]

    Aleksander Madry, Aleksandar Makelov, Ludwig Schmidt, Dimitris Tsipras, and Adrian Vladu. 2017. Towards Deep Learning Models Resistant to Adversarial Attacks.CoRRabs/1706.06083 (2017)

  13. [13]

    Sotgiu, A

    A. Sotgiu, A. Demontis, M. Melis, B. Biggio, G. Fumera, X. Feng, and F. Roli

  14. [14]

    Deep Neural Rejection against Adversarial Examples.EURASIP Journal on Information Security2020 (2019)

  15. [15]

    Jonathan Uesato, Brendan O’Donoghue, Aaron van den Oord, and Pushmeet Kohli. 2018. Adversarial Risk and the Dangers of Evaluating Against Weak Attacks. arXiv:1802.05666 [cs.LG]

  16. [16]

    Xiaosen Wang, Zeliang Zhang, and Jianping Zhang. 2023. Structure Invari- ant Transformation for better Adversarial Transferability. InProceedings of the IEEE/CVF International Conference on Computer Vision

  17. [17]

    Fei Zhang, Zhe Li, Yahang Hu, and Yaohua Wang. 2024. CIGA: Detecting Adver- sarial Samples via Critical Inference Graph Analysis. In2024 Annual Computer Security Applications Conference (ACSAC). 1231–1244. doi:10.1109/ACSAC63791. 2024.00098