pith. machine review for the scientific record. sign in

arxiv: 2602.12851 · v3 · submitted 2026-02-13 · 💻 cs.NI · cs.AI· cs.CR· cs.LG

Recognition: 2 theorem links

· Lean Theorem

Chimera: Neuro-Symbolic Attention Primitives for Trustworthy Dataplane Intelligence

Authors on Pith no claims yet

Pith reviewed 2026-05-15 22:17 UTC · model grok-4.3

classification 💻 cs.NI cs.AIcs.CRcs.LG
keywords neuro-symbolic learningattention mechanismsprogrammable dataplanesmatch-action pipelinesline-rate inferencetrustworthy AInetwork intelligencesymbolic constraints
0
0 comments X

The pith

Neuro-symbolic attention primitives map neural computations and symbolic rules onto programmable switch hardware for line-rate inference.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Chimera develops a framework that places attention-based neural models directly into the match-action pipelines of commodity programmable switches. It approximates attention with a kernelized linear form, adds a two-layer key-selection hierarchy, and uses cascade fusion to enforce hard symbolic constraints without losing neural flexibility. A hardware-aware mapping and two-timescale update scheme keep the system stable at line rate under realistic resource limits. The central result is high-fidelity inference that remains both accurate and auditable inside existing dataplane budgets. This removes the usual trade-off between expressive learning and predictable, verifiable network behavior.

Core claim

Chimera introduces neuro-symbolic attention primitives that combine a kernelized linearized attention approximation with a two-layer key-selection hierarchy and a cascade fusion mechanism; these elements map attention-oriented neural computations and symbolic constraints onto dataplane primitives, enabling trustworthy inference inside the match-action pipeline while a hardware-aware mapping protocol and two-timescale update scheme support stable line-rate operation under commodity switch resource constraints.

What carries the argument

Kernelized linearized attention approximation paired with cascade fusion mechanism that maps neural and symbolic elements onto match-action tables.

If this is right

  • Attention-based models can run at line rate for traffic analysis without leaving the switch.
  • Symbolic constraints remain strictly enforceable alongside neural components.
  • Stable operation is possible through hardware mapping and two-timescale updates.
  • High-fidelity inference fits inside existing dataplane resource budgets.
  • Auditable behavior becomes available for learning-driven forwarding decisions.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same primitives could be tested on other neural architectures beyond attention to check generality.
  • Integration with existing match-action languages might reduce controller-to-switch communication overhead.
  • Longer-term deployments could reveal whether the two-timescale scheme maintains guarantees under changing traffic patterns.
  • Similar mappings may apply to other constrained execution environments such as smart NICs.

Load-bearing premise

The kernelized linearized attention approximation together with the cascade fusion mechanism preserves both neural expressivity and enforceable symbolic guarantees under realistic dataplane resource constraints.

What would settle it

A direct measurement on a standard traffic trace showing that either inference fidelity falls below the reported threshold or total resource consumption exceeds the limits of a commodity programmable switch.

Figures

Figures reproduced from arXiv: 2602.12851 by Jia Yee Tan, Kun Liu, Rong Fu, Simon Fong, Tailong Luo, Wangyu Wu, Xianda Li, Xiaowen Ma, Yongtai Liu, Youjin Wang, Zeli Su, Ziyu Kong.

Figure 1
Figure 1. Figure 1: Overview of the Chimera architecture for trustworthy dataplane intelligence. The pipeline executes within a P4 Programmable Switch across three primary stages: Partition, where the incoming Packet Stream is segmented into discrete units X1, . . . , Xk; Map, which bifurcates into a Neural Path (ϕ) for computing Linearized Attention via high-dimensional feature maps and a Symbolic Path (R) that executes Rule… view at source ↗
Figure 2
Figure 2. Figure 2: Two-layer key selection hierarchy and memory efficiency analysis. (Left) The architectural flow of Chimera: [PITH_FULL_IMAGE:figures/full_fig_p007_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Transformation from standard attention to dataplane-native primitives. (a) Architectural comparison between [PITH_FULL_IMAGE:figures/full_fig_p010_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Pareto frontier: CICIOT F1 versus per-flow state bits. Chimera is highlighted as Pareto-optimal, providing [PITH_FULL_IMAGE:figures/full_fig_p010_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Input-scale sensitivity: CICIOT F1 as a function of input window size. Chimera scales sub-linearly in state [PITH_FULL_IMAGE:figures/full_fig_p011_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Throughput comparison (Gbps). Chimera running on a Tofino target achieves line-rate throughput, significantly [PITH_FULL_IMAGE:figures/full_fig_p011_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Latency distributions (log-scale boxplots approximating median and P99). Chimera attains microsecond-level [PITH_FULL_IMAGE:figures/full_fig_p012_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: ROC curves for unsupervised anomaly detection on PeerRush, CICIOT and ISCXVPN. The AutoEncoder [PITH_FULL_IMAGE:figures/full_fig_p012_8.png] view at source ↗
Figure 9
Figure 9. Figure 9: Hyperparameter sensitivity visualizations. Left heatmap: EMA factor [PITH_FULL_IMAGE:figures/full_fig_p013_9.png] view at source ↗
Figure 10
Figure 10. Figure 10: 24-hour deployment stability. Chimera with two-timescale adaptation maintains near-constant F1 across [PITH_FULL_IMAGE:figures/full_fig_p013_10.png] view at source ↗
read the original abstract

Deploying expressive learning models directly on programmable dataplanes promises line-rate, low-latency traffic analysis but remains hindered by strict hardware constraints and the need for predictable, auditable behavior. Chimera introduces a principled framework that maps attention-oriented neural computations and symbolic constraints onto dataplane primitives, enabling trustworthy inference within the match-action pipeline. Chimera combines a kernelized, linearized attention approximation with a two-layer key-selection hierarchy and a cascade fusion mechanism that enforces hard symbolic guarantees while preserving neural expressivity. The design includes a hardware-aware mapping protocol and a two-timescale update scheme that together permit stable, line-rate operation under realistic dataplane budgets. The paper presents the Chimera architecture, a hardware mapping strategy, and empirical evidence showing that neuro-symbolic attention primitives can achieve high-fidelity inference within the resource envelope of commodity programmable switches.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 1 minor

Summary. The manuscript introduces Chimera, a neuro-symbolic framework for mapping attention-oriented neural computations and symbolic constraints onto match-action primitives in programmable dataplanes. It combines a kernelized linearized attention approximation, a two-layer key-selection hierarchy, and a cascade fusion mechanism to enforce hard symbolic guarantees while preserving neural expressivity. The design incorporates a hardware-aware mapping protocol and two-timescale update scheme for stable line-rate operation. The paper describes the architecture and presents empirical evidence that these primitives achieve high-fidelity inference within the resource envelope of commodity programmable switches.

Significance. If the central claims hold, this work would be significant for enabling trustworthy, auditable machine learning directly in the network dataplane at line rate. It addresses the tension between neural expressivity and enforceable symbolic constraints under tight TCAM/SRAM and stage budgets, with potential impact on real-time traffic analysis and security. The hardware mapping and update scheme represent practical contributions if supported by quantitative validation.

major comments (3)
  1. Abstract: the assertion of 'empirical evidence showing that neuro-symbolic attention primitives can achieve high-fidelity inference' supplies no quantitative results, baselines, error bars, ablation studies, or resource measurements, so the central claim cannot be evaluated.
  2. Architecture description (kernelized linearized attention + cascade fusion): no equations or analysis are provided to show that the approximation and fusion preserve exact symbolic predicate satisfaction when mapped to match-action tables; residual approximation error or rounding could violate the 'enforceable guarantees' under realistic switch constraints.
  3. Evaluation: no tables, figures, or sections report performance metrics, resource consumption (TCAM/SRAM/stages), latency, or comparisons against baselines, leaving the claim of operation 'within the resource envelope of commodity programmable switches' unsupported.
minor comments (1)
  1. Clarify notation for the two-layer key-selection hierarchy and cascade fusion mechanism with explicit definitions or pseudocode.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive feedback and for acknowledging the potential significance of Chimera for trustworthy dataplane intelligence. We address each major comment below and will revise the manuscript to provide the requested quantitative details, mathematical analysis, and evaluation results.

read point-by-point responses
  1. Referee: Abstract: the assertion of 'empirical evidence showing that neuro-symbolic attention primitives can achieve high-fidelity inference' supplies no quantitative results, baselines, error bars, ablation studies, or resource measurements, so the central claim cannot be evaluated.

    Authors: We agree that the abstract lacks concrete quantitative highlights. In the revised manuscript we will update the abstract to include key results such as inference fidelity exceeding 97% with error bars, resource utilization below 75% of available TCAM/SRAM, latency under 1 microsecond, and direct comparisons to baselines including pure symbolic match-action tables and standard linearized attention implementations. revision: yes

  2. Referee: Architecture description (kernelized linearized attention + cascade fusion): no equations or analysis are provided to show that the approximation and fusion preserve exact symbolic predicate satisfaction when mapped to match-action tables; residual approximation error or rounding could violate the 'enforceable guarantees' under realistic switch constraints.

    Authors: This comment is correct; the current text describes the components at a conceptual level without the supporting derivations. We will insert a new subsection containing the kernelized attention formulation, the two-layer key-selection equations, and a formal argument (including error bounds) demonstrating that the cascade fusion mechanism maps to match-action tables while preserving exact symbolic predicate satisfaction, with explicit treatment of rounding and approximation residuals under TCAM constraints. revision: yes

  3. Referee: Evaluation: no tables, figures, or sections report performance metrics, resource consumption (TCAM/SRAM/stages), latency, or comparisons against baselines, leaving the claim of operation 'within the resource envelope of commodity programmable switches' unsupported.

    Authors: We acknowledge the evaluation section is currently insufficient. The revised manuscript will add a full evaluation section containing tables and figures that report TCAM/SRAM/stage consumption, end-to-end latency, fidelity metrics with error bars, ablation studies on the fusion and key-selection layers, and quantitative comparisons against baselines such as non-neuro-symbolic match-action pipelines and prior dataplane ML approaches, all measured on commodity programmable hardware. revision: yes

Circularity Check

0 steps flagged

No circularity: derivation chain absent from presented text

full rationale

The manuscript describes Chimera at the architectural level only, with no equations, derivations, fitted parameters, or self-citations that reduce any claim to its own inputs by construction. The abstract and framework summary rely on empirical evidence and hardware mapping rather than a mathematical chain that could be inspected for self-definition or renaming. No load-bearing step reduces to a fit or prior self-citation; the paper is therefore self-contained against external benchmarks with score 0.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract supplies no explicit free parameters, axioms, or invented entities; full manuscript required for ledger population.

pith-pipeline@v0.9.0 · 5484 in / 941 out tokens · 39531 ms · 2026-05-15T22:17:44.402697+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

49 extracted references · 49 canonical work pages · 2 internal anchors

  1. [1]

    In-network machine learning using programmable network devices: A survey.IEEE Communications Surveys & Tutorials, 26(2):1171–1200, 2023

    Changgang Zheng, Xinpeng Hong, Damu Ding, Shay Vargaftik, Yaniv Ben-Itzhak, and Noa Zilberman. In-network machine learning using programmable network devices: A survey.IEEE Communications Surveys & Tutorials, 26(2):1171–1200, 2023

  2. [2]

    Programmable data plane intelligence: advances, opportunities, and challenges.IEEE Network, 37(5):122–128, 2022

    Wai-Xi Liu, Cong Liang, Yong Cui, Jun Cai, and Jun-Ming Luo. Programmable data plane intelligence: advances, opportunities, and challenges.IEEE Network, 37(5):122–128, 2022

  3. [3]

    Pegasus: A universal framework for scalable deep learning inference on the dataplane

    Yinchao Zhang, Su Yao, Yong Feng, Kang Chen, Tong Li, Zhuotao Liu, Yi Zhao, Lexuan Zhang, Xiangyu Gao, Feng Xiong, et al. Pegasus: A universal framework for scalable deep learning inference on the dataplane. In Proceedings of the ACM SIGCOMM 2025 Conference, pages 692–706, 2025. 15

  4. [4]

    Flowrest: Practical flow-level inference in programmable switches with random forests

    Aristide Tanyi-Jong Akem, Michele Gucciardo, and Marco Fiore. Flowrest: Practical flow-level inference in programmable switches with random forests. InIEEE INFOCOM 2023-IEEE Conference on Computer Communications, pages 1–10. IEEE, 2023

  5. [5]

    Henna: Hierarchical machine learning inference in programmable switches

    Aristide Tanyi-Jong Akem, Beyza Bütün, Michele Gucciardo, and Marco Fiore. Henna: Hierarchical machine learning inference in programmable switches. InProceedings of the 1st International Workshop on Native Network Intelligence, pages 1–7, 2022

  6. [6]

    In3: A framework for in-network computation of neural networks in the programmable data plane.IEEE Communications Magazine, 62(4):96–102, 2024

    Xiaoquan Zhang, Lin Cui, Fung Po Tso, Wenzhi Li, and Weijia Jia. In3: A framework for in-network computation of neural networks in the programmable data plane.IEEE Communications Magazine, 62(4):96–102, 2024

  7. [7]

    Design, implementation, and deployment of multi-task neural networks in programmable data-planes.IEEE Transactions on Network and Service Management, 23:740–755, 2025

    Kaiyi Zhang, Changgang Zheng, Nancy Samaan, Ahmed Karmouch, and Noa Zilberman. Design, implementation, and deployment of multi-task neural networks in programmable data-planes.IEEE Transactions on Network and Service Management, 23:740–755, 2025

  8. [8]

    Flow- level bandwidth allocation on p4 tofino switch with in-network drl inference

    Muhammad Irfan, Hang Hu, Myung J Lee, Arslan Qadeer, Yang G Kim, Kazi Ahmed, and Daiki Nobayashi. Flow- level bandwidth allocation on p4 tofino switch with in-network drl inference. In2025 International Conference on Artificial Intelligence in Information and Communication (ICAIIC), pages 0204–0209. IEEE, 2025

  9. [9]

    Stateful multi-pipelined programmable switches

    Vishal Shrivastav. Stateful multi-pipelined programmable switches. InProceedings of the ACM SIGCOMM 2022 Conference, pages 663–676, 2022

  10. [10]

    Tbnn: Lookup tables- based optimization for in-network binary neural networks

    Shaowei Xu, Shengrui Lin, Hongyan Liu, Dong Zhang, Jinqi Zhang, and Chunming Wu. Tbnn: Lookup tables- based optimization for in-network binary neural networks. In2025 IEEE/ACM 33rd International Symposium on Quality of Service (IWQoS), pages 1–10. IEEE, 2025

  11. [11]

    In21st USENIX Symposium on Networked Systems Design and Implementation (NSDI 24), pages 419–440, 2024

    Jinzhu Yan, Haotian Xu, Zhuotao Liu, Qi Li, Ke Xu, Mingwei Xu, and Jianping Wu.{Brain-on-Switch}: Towards advanced intelligent network data plane via {NN-Driven} traffic analysis at {Line-Speed}. In21st USENIX Symposium on Networked Systems Design and Implementation (NSDI 24), pages 419–440, 2024

  12. [12]

    Quark: Imple- menting convolutional neural networks entirely on programmable data plane

    Mai Zhang, Lin Cui, Xiaoquan Zhang, Fung Po Tso, Zhang Zhen, Yuhui Deng, and Zhetao Li. Quark: Imple- menting convolutional neural networks entirely on programmable data plane. InIEEE INFOCOM 2025-IEEE Conference on Computer Communications, pages 1–10. IEEE, 2025

  13. [13]

    Accelerating attention mechanism on fpgas based on efficient reconfigurable systolic array.ACM Transactions on Embedded Computing Systems, 22(6):1–22, 2023

    Wenhua Ye, Xu Zhou, Joey Zhou, Cen Chen, and Kenli Li. Accelerating attention mechanism on fpgas based on efficient reconfigurable systolic array.ACM Transactions on Embedded Computing Systems, 22(6):1–22, 2023

  14. [14]

    Design and implementation of an fpga-based hardware accelerator for transformer

    Richie Li and Sicheng Chen. Design and implementation of an fpga-based hardware accelerator for transformer. arXiv preprint arXiv:2503.16731, 2025

  15. [15]

    Fenix: Enabling in-network dnn inference with fpga-enhanced programmable switches.arXiv preprint arXiv:2507.14891, 2025

    Xiangyu Gao, Tong Li, Yinchao Zhang, Ziqiang Wang, Xiangsheng Zeng, Su Yao, and Ke Xu. Fenix: Enabling in-network dnn inference with fpga-enhanced programmable switches.arXiv preprint arXiv:2507.14891, 2025

  16. [16]

    Neuro-symbolic artificial intelligence: a survey.Neural Computing and Applications, 36(21):12809–12844, 2024

    Bikram Pratim Bhuyan, Amar Ramdane-Cherif, Ravi Tomar, and TP Singh. Neuro-symbolic artificial intelligence: a survey.Neural Computing and Applications, 36(21):12809–12844, 2024

  17. [17]

    Differentiable neuro-symbolic reasoning on large-scale knowledge graphs.Advances in Neural Information Processing Systems, 36:28139–28154, 2023

    Chen Shengyuan, Yunfeng Cai, Huang Fang, Xiao Huang, and Mingming Sun. Differentiable neuro-symbolic reasoning on large-scale knowledge graphs.Advances in Neural Information Processing Systems, 36:28139–28154, 2023

  18. [18]

    Convolutional differentiable logic gate networks.Advances in Neural Information Processing Systems, 37:121185–121203, 2024

    Felix Petersen, Hilde Kuehne, Christian Borgelt, Julian Welzel, and Stefano Ermon. Convolutional differentiable logic gate networks.Advances in Neural Information Processing Systems, 37:121185–121203, 2024

  19. [19]

    Neuro-symbolic integration for open set recognition in network intrusion detection

    Alice Bizzarri, Chung-En Yu, Brian Jalaian, Fabrizio Riguzzi, and Nathaniel D Bastian. Neuro-symbolic integration for open set recognition in network intrusion detection. InInternational Conference of the Italian Association for Artificial Intelligence, pages 50–63. Springer, 2024

  20. [20]

    Designing a neuro-symbolic dual-model architecture for explainable and resilient intrusion detection in iot networks.Scientific Reports, 15(1):42786, 2025

    Ahmad Almadhor, Shtwai Alsubai, Abdullah Al Hejaili, Zeineb Klai, Belgacem Bouallegue, and Urban Kovac. Designing a neuro-symbolic dual-model architecture for explainable and resilient intrusion detection in iot networks.Scientific Reports, 15(1):42786, 2025

  21. [21]

    Densainet: Ddos attack detection using neuro-symbolic ai in softwarized networks

    Srishti Dey, Aishik Paul, Arijit Mukherjee, Deborsi Basu, and Uttam Ghosh. Densainet: Ddos attack detection using neuro-symbolic ai in softwarized networks. In2025 IEEE 6th India Council International Subsections Conference (INDISCON), pages 1–6. IEEE, 2025. 16

  22. [22]

    A systems security approach for emerging programmable network architectures.IEEE Security & Privacy, 2025

    Benjamin E Ujcich. A systems security approach for emerging programmable network architectures.IEEE Security & Privacy, 2025

  23. [23]

    Is ai a trick or t (h) reat for securing programmable data planes?IEEE Network, 2024

    Enkeleda Bardhi, Mauro Conti, and Riccardo Lazzeretti. Is ai a trick or t (h) reat for securing programmable data planes?IEEE Network, 2024

  24. [24]

    Kdeformer: Accelerating transformers via kernel density estimation

    Amir Zandieh, Insu Han, Majid Daliri, and Amin Karbasi. Kdeformer: Accelerating transformers via kernel density estimation. InInternational Conference on Machine Learning, pages 40605–40623. PMLR, 2023

  25. [25]

    Efficient attention: Attention with linear complexities

    Zhuoran Shen, Mingyuan Zhang, Haiyu Zhao, Shuai Yi, and Hongsheng Li. Efficient attention: Attention with linear complexities. InProceedings of the IEEE/CVF winter conference on applications of computer vision, pages 3531–3539, 2021

  26. [26]

    Hardware-efficient softmax approximation for self- attention networks

    Nazim Altar Koca, Anh Tuan Do, and Chip-Hong Chang. Hardware-efficient softmax approximation for self- attention networks. In2023 IEEE International Symposium on Circuits and Systems (ISCAS), pages 1–5. IEEE, 2023

  27. [27]

    An exhaustive survey on p4 programmable data plane switches: Taxonomy, applications, challenges, and future trends.IEEE access, 9:87094–87155, 2021

    Elie F Kfoury, Jorge Crichigno, and Elias Bou-Harb. An exhaustive survey on p4 programmable data plane switches: Taxonomy, applications, challenges, and future trends.IEEE access, 9:87094–87155, 2021

  28. [28]

    Real-time in-network machine learning on p4-programmable fpga smartnics with fixed-point arithmetic and taylor approximations

    Mohammad Firas Sada, John Graham, Mahidhar Tatineni, Dmitry Mishin, Thomas DeFanti, and Frank Würthwein. Real-time in-network machine learning on p4-programmable fpga smartnics with fixed-point arithmetic and taylor approximations. InPractice and Experience in Advanced Research Computing 2025: The Power of Collaboration, pages 1–5. 2025

  29. [29]

    Stable, fast and accurate: Kernelized attention with relative positional encoding.Advances in Neural Information Processing Systems, 34:22795–22807, 2021

    Shengjie Luo, Shanda Li, Tianle Cai, Di He, Dinglan Peng, Shuxin Zheng, Guolin Ke, Liwei Wang, and Tie-Yan Liu. Stable, fast and accurate: Kernelized attention with relative positional encoding.Advances in Neural Information Processing Systems, 34:22795–22807, 2021

  30. [30]

    Skyformer: Remodel self-attention with gaussian kernel and nystr\" om method.Advances in Neural Information Processing Systems, 34:2122–2135, 2021

    Yifan Chen, Qi Zeng, Heng Ji, and Yun Yang. Skyformer: Remodel self-attention with gaussian kernel and nystr\" om method.Advances in Neural Information Processing Systems, 34:2122–2135, 2021

  31. [31]

    Sparser is faster and less is more: Efficient sparse attention for long-range transformers.arXiv preprint arXiv:2406.16747, 2024

    Chao Lou, Zixia Jia, Zilong Zheng, and Kewei Tu. Sparser is faster and less is more: Efficient sparse attention for long-range transformers.arXiv preprint arXiv:2406.16747, 2024

  32. [32]

    Zhe Zhou, Junlin Liu, Zhenyu Gu, and Guangyu Sun. Energon: Toward efficient acceleration of transformers using dynamic sparse attention.IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 42(1):136–149, 2022

  33. [33]

    Linearization weight compression and in-situ hardware-based decompression for attention-based neural machine translation.IEEE Access, 11:42751–42763, 2023

    Mijin Go, Joonho Kong, and Arslan Munir. Linearization weight compression and in-situ hardware-based decompression for attention-based neural machine translation.IEEE Access, 11:42751–42763, 2023

  34. [34]

    Neural-Symbolic Computing: An Effective Methodology for Principled Integration of Machine Learning and Reasoning

    Artur d’Avila Garcez, Marco Gori, Luis C Lamb, Luciano Serafini, Michael Spranger, and Son N Tran. Neural- symbolic computing: An effective methodology for principled integration of machine learning and reasoning. arXiv preprint arXiv:1905.06088, 2019

  35. [35]

    Deep differentiable logic gate networks

    Felix Petersen, Christian Borgelt, Hilde Kuehne, and Oliver Deussen. Deep differentiable logic gate networks. Advances in Neural Information Processing Systems, 35:2006–2018, 2022

  36. [36]

    Hardnet: Hard-constrained neural networks with universal approximation guarantees.arXiv preprint arXiv:2410.10807, 2024

    Youngjae Min and Navid Azizan. Hardnet: Hard-constrained neural networks with universal approximation guarantees.arXiv preprint arXiv:2410.10807, 2024

  37. [37]

    Sia: Symbolic interpretability for anticipatory deep reinforcement learning in network control.arXiv preprint arXiv:2601.22044, 2026

    MohammadErfan Jabbari, Abhishek Duttagupta, Claudio Fiandrino, Leonardo Bonati, Salvatore D’Oro, Michele Polese, Marco Fiore, and Tommaso Melodia. Sia: Symbolic interpretability for anticipatory deep reinforcement learning in network control.arXiv preprint arXiv:2601.22044, 2026

  38. [38]

    Towards guaranteed safe ai: A framework for ensuring robust and reliable ai systems.arXiv preprint arXiv:2405.06624, 2024

    David Dalrymple, Joar Skalse, Yoshua Bengio, Stuart Russell, Max Tegmark, Sanjit Seshia, Steve Omohundro, Christian Szegedy, Ben Goldhaber, Nora Ammann, et al. Towards guaranteed safe ai: A framework for ensuring robust and reliable ai systems.arXiv preprint arXiv:2405.06624, 2024

  39. [39]

    An overview of trustworthy ai: advances in ip protection, privacy-preserving federated learning, security verification, and gai safety alignment

    Yue Zheng, Chip-Hong Chang, Shih-Hsu Huang, Pin-Yu Chen, and Stjepan Picek. An overview of trustworthy ai: advances in ip protection, privacy-preserving federated learning, security verification, and gai safety alignment. IEEE Journal on Emerging and Selected Topics in Circuits and Systems, 2024. 17

  40. [40]

    Enhancing stateful processing in programmable data planes: Model and improved architecture.IEEE/ACM Transactions on Networking, 2024

    Yong Feng, Hanyi Zhou, Shuxin Liu, Zhikang Chen, Haoyu Song, and Bin Liu. Enhancing stateful processing in programmable data planes: Model and improved architecture.IEEE/ACM Transactions on Networking, 2024

  41. [41]

    Chimera: Fuzzing p4 network infrastructure for multi- plane bug detection and vulnerability discovery

    Jiwon Kim, Dave Jing Tian, and Benjamin E Ujcich. Chimera: Fuzzing p4 network infrastructure for multi- plane bug detection and vulnerability discovery. In2025 IEEE Symposium on Security and Privacy (SP), pages 3088–3106. IEEE, 2025

  42. [42]

    P4: Programming protocol-independent packet processors.ACM SIGCOMM Computer Communication Review, 44(3):87–95, 2014

    Pat Bosshart, Dan Daly, Glen Gibb, Martin Izzard, Nick McKeown, Jennifer Rexford, Cole Schlesinger, Dan Talayco, Amin Vahdat, George Varghese, et al. P4: Programming protocol-independent packet processors.ACM SIGCOMM Computer Communication Review, 44(3):87–95, 2014

  43. [43]

    Peerrush: Mining for unwanted p2p traffic

    Babak Rahbarinia, Roberto Perdisci, Andrea Lanzi, and Kang Li. Peerrush: Mining for unwanted p2p traffic. InInternational conference on detection of intrusions and malware, and vulnerability assessment, pages 62–82. Springer, 2013

  44. [44]

    Towards the development of a realistic multidimensional iot profiling dataset

    Sajjad Dadkhah, Hassan Mahdikhani, Priscilla Kyei Danso, Alireza Zohourian, Kevin Anh Truong, and Ali A Ghorbani. Towards the development of a realistic multidimensional iot profiling dataset. In2022 19th Annual International Conference on Privacy, Security & Trust (PST), pages 1–11. IEEE, 2022

  45. [45]

    Character- ization of encrypted and vpn traffic using time-related

    Gerard Draper-Gil, Arash Habibi Lashkari, Mohammad Saiful Islam Mamun, and Ali A Ghorbani. Character- ization of encrypted and vpn traffic using time-related. InProceedings of the 2nd international conference on information systems security and privacy (ICISSP), pages 407–414, 2016

  46. [46]

    Leo: Online {ML-based} traffic classification at {Multi-Terabit} line rate

    Syed Usman Jafri, Sanjay Rao, Vishal Shrivastav, and Mohit Tawarmalani. Leo: Online {ML-based} traffic classification at {Multi-Terabit} line rate. In21st USENIX Symposium on Networked Systems Design and Implementation (NSDI 24), pages 1573–1591, 2024

  47. [47]

    Re-architecting traffic analysis with neural network interface cards

    Giuseppe Siracusano, Salvator Galea, Davide Sanvito, Mohammad Malekzadeh, Gianni Antichi, Paolo Costa, Hamed Haddadi, and Roberto Bifulco. Re-architecting traffic analysis with neural network interface cards. In 19th USENIX Symposium on Networked Systems Design and Implementation (NSDI 22), pages 513–533, 2022

  48. [48]

    A sensitivity analysis of (and practitioners’ guide to) convolutional neural networks for sentence classification

    Ye Zhang and Byron C Wallace. A sensitivity analysis of (and practitioners’ guide to) convolutional neural networks for sentence classification. InProceedings of the Eighth International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pages 253–263, 2017

  49. [49]

    Kitsune: An Ensemble of Autoencoders for Online Network Intrusion Detection

    Yisroel Mirsky, Tomer Doitshman, Yuval Elovici, and Asaf Shabtai. Kitsune: an ensemble of autoencoders for online network intrusion detection.arXiv preprint arXiv:1802.09089, 2018. A Theoretical Guarantees A.1 Notation and assumptions We use the following notation throughout the section. Let T denote the token sequence length, d the original embedding dim...