pith. machine review for the scientific record. sign in

arxiv: 2604.26219 · v1 · submitted 2026-04-29 · 💻 cs.CR · cs.LG

Recognition: unknown

eDySec: A Deep Learning-based Explainable Dynamic Analysis Framework for Detecting Malicious Packages in PyPI Ecosystem

Authors on Pith no claims yet

Pith reviewed 2026-05-07 13:32 UTC · model grok-4.3

classification 💻 cs.CR cs.LG
keywords malicious package detectiondeep learningdynamic analysisPyPI ecosystemexplainable AIsoftware supply chainbehavioral featuresfalse positive reduction
0
0 comments X

The pith

Deep learning on dynamic package behaviors detects malicious PyPI packages with half the features and 82 percent fewer false positives.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces eDySec, a framework that applies deep learning to dynamic behavioral data from PyPI packages, including system calls, network traffic, directory access, and dependency logs captured at install time and afterward. It argues that this handles the high-dimensional and sparse character of such data more effectively than traditional machine learning, while adding stability checks and explainable AI to make decisions reliable and transparent. Evaluation on the QUT-DV25 dataset shows the framework halves feature dimensionality, cuts false positives by 82 percent, false negatives by 79 percent, raises accuracy by 3 percent, reaches near-perfect stability, and runs at 170 milliseconds per package. A sympathetic reader would care because supply-chain attacks on open-source repositories are rising, and practical improvements in detection could reduce the chance that developers unknowingly incorporate malicious code.

Core claim

eDySec is a deep learning-based explainable dynamic analysis framework that outperforms state-of-the-art methods for detecting malicious PyPI packages. It achieves this by evaluating deep learning models on dynamic behavioral features from the QUT-DV25 dataset, selecting the most discriminative attributes, incorporating model stability analysis, and applying explainable AI techniques. The result is halved feature dimensionality, 82 percent lower false positives, 79 percent lower false negatives, 3 percent higher accuracy, near-perfect stability, and 170 ms inference latency per package, with the authors noting that poor feature or model choices can degrade performance.

What carries the argument

The eDySec pipeline, which combines deep learning models applied to selected dynamic behavioral features (install-time and post-installation) with stability analysis and explainable AI to produce efficient and interpretable detections.

If this is right

  • Fewer legitimate packages are incorrectly flagged, reducing unnecessary review burden for developers and repository maintainers.
  • Lower false negatives mean more actual malicious packages are caught before they reach users.
  • Explainable outputs allow security teams to understand and verify the reasons for each detection.
  • Reduced feature count and 170 ms latency make the approach suitable for integration into package installation workflows.
  • The finding that some model-feature combinations degrade performance highlights the need for careful selection in any deployed detector.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • If the dynamic-behavior approach generalizes, package managers could run similar lightweight scans automatically during installation.
  • The same feature-selection and stability methods might improve detection in other language ecosystems that face analogous supply-chain risks.
  • Combining the dynamic signals emphasized here with static code analysis could address attacks that only appear after installation.
  • The emphasis on model stability suggests the framework could support repeated scans over time without retraining drift.

Load-bearing premise

The performance gains measured on the QUT-DV25 dataset will generalize to the full range of real-world PyPI packages without the chosen models and features overfitting to the dataset's particular traits.

What would settle it

Testing eDySec on an independent collection of labeled malicious and benign PyPI packages gathered after the QUT-DV25 dataset would show whether the reported drops in false positives and negatives, accuracy gain, and stability persist.

Figures

Figures reproduced from arXiv: 2604.26219 by Abu Bakar Siddique Mahi, Chadni Islam, Gowri Ramachandran, Raja Jurdak, Sk Tanzir Mehedi.

Figure 1
Figure 1. Figure 1: Overall system architecture of the proposed eDySec view at source ↗
Figure 2
Figure 2. Figure 2: Proposed eDySec framework for detecting mali view at source ↗
Figure 3
Figure 3. Figure 3: Overview of the QUT-DV25 dataset: (a) statistics of view at source ↗
Figure 4
Figure 4. Figure 4: t-SNE visualization of 200 randomly selected sam view at source ↗
Figure 5
Figure 5. Figure 5: Performance comparison of feature selection meth view at source ↗
Figure 6
Figure 6. Figure 6: Performance of FLAML-based MLP model on the view at source ↗
Figure 7
Figure 7. Figure 7: Performance of FLAML-based MLP model on the view at source ↗
Figure 8
Figure 8. Figure 8: Comparison of (a) FPR and FNR across trace types view at source ↗
Figure 10
Figure 10. Figure 10: SHAP waterfall explanations for representative view at source ↗
Figure 9
Figure 9. Figure 9: Global SHAP summary of the most influential fea view at source ↗
Figure 11
Figure 11. Figure 11: Local explanations for representative samples. (a) view at source ↗
Figure 1
Figure 1. Figure 1: Performance comparison of different feature selection methods on the QUT-DV25 dataset using MLP model. view at source ↗
Figure 2
Figure 2. Figure 2: Performance of the FLAML-DL models on the QUT-DV25 Combined dataset: (a) accuracy and (b) loss. view at source ↗
Figure 3
Figure 3. Figure 3: Performance of the FLAML-DL models on the QUT-DV25 Combined dataset: (a) Confusion matrix and (b) ROC curve. view at source ↗
Figure 4
Figure 4. Figure 4: Performance of the FLAML-DL models on the Pattern trace dataset: (a) accuracy and (b) loss. view at source ↗
Figure 5
Figure 5. Figure 5: Performance of the FLAML-DL models on the Pattern trace dataset: (a) Confusion matrix and (b) ROC curve. view at source ↗
read the original abstract

The security of open-source software repositories is increasingly threatened by next-gen software supply chain attacks. These attacks include multiphase malware execution, remote access activation, and dynamic payload generation. Traditional Machine Learning (ML) detectors struggle to detect these attacks due to the high-dimensional and sparse nature of dynamic behavioral data, including system calls, network traffic, directory access patterns, and dependency logs. As a result, these data characteristics degrade the performance, stability, and explainability of ML models. These challenges have made Deep Learning (DL) a promising alternative, given its success across various domains and its potential for modeling complex patterns. This paper presents eDySec, a DL-based efficient, stable, and explainable framework for dynamic behavioral analysis to detect malicious packages. Using the QUT-DV25 dataset, which captures both install-time and post-installation behaviors of packages, we evaluate DL models and investigate feature sets to identify the most discriminative attributes for enabling efficient malicious package detection. Additionally, model stability analysis and explainable AI techniques are incorporated into the detection pipeline to enable stable, and transparent interpretations of model decisions. Experimental results demonstrate that eDySec significantly outperforms the state-of-the-art frameworks. Specifically, it halves feature dimensionality while lowering false positives by 82% and false negatives by 79%. It also improves accuracy by 3%, achieves near-perfect stability, and maintains an inference latency of 170ms per package. Further analysis reveals that feature and model selection play a critical role, as certain combinations degrade performance. Ultimately, this study advances the understanding of the strengths and limitations of dynamic analysis against next-gen attacks.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 1 minor

Summary. The paper presents eDySec, a deep learning-based explainable dynamic analysis framework for detecting malicious packages in the PyPI ecosystem. It evaluates DL models and feature sets on the QUT-DV25 dataset (capturing install-time and post-install behaviors), incorporates stability analysis and XAI techniques, and claims to outperform SOTA by halving feature dimensionality, reducing false positives by 82%, false negatives by 79%, improving accuracy by 3%, achieving near-perfect stability, and maintaining 170ms inference latency per package.

Significance. If the empirical results prove robust, this would represent a meaningful advance in software supply-chain security by tackling high-dimensional sparse dynamic data (system calls, network traffic, etc.) with efficient, stable, and interpretable DL models. The emphasis on next-generation attacks and the combination of performance, stability, and explainability could inform practical detectors for open-source repositories.

major comments (3)
  1. [Experimental results] The experimental evaluation provides no details on QUT-DV25 dataset size, collection window, labeling source or process, or train/test split strategy. These omissions are load-bearing because the headline claims (82% FP reduction, 79% FN reduction, halving of dimensionality) cannot be assessed for generalizability or absence of collection artifacts without this information.
  2. [Feature and model selection] It is not stated whether feature selection (the process that halves dimensionality) was performed inside or outside the cross-validation loop. If performed on the full dataset, the reported performance deltas and stability results are at risk of optimistic bias and may not reflect true out-of-sample behavior on the sparsity profile of QUT-DV25.
  3. [Results and discussion] The state-of-the-art baselines used for comparison are not described in sufficient detail (implementation, hyper-parameters, or exact experimental conditions), preventing verification of the claimed 3% accuracy improvement and the 82%/79% FP/FN reductions.
minor comments (1)
  1. [Abstract] The abstract contains a minor grammatical issue ('stable, and transparent' should read 'stable and transparent').

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive feedback. We address each major comment below and have revised the manuscript to enhance reproducibility and clarity.

read point-by-point responses
  1. Referee: The experimental evaluation provides no details on QUT-DV25 dataset size, collection window, labeling source or process, or train/test split strategy. These omissions are load-bearing because the headline claims (82% FP reduction, 79% FN reduction, halving of dimensionality) cannot be assessed for generalizability or absence of collection artifacts without this information.

    Authors: We agree that these details are necessary to evaluate generalizability and rule out artifacts. The revised manuscript adds a dedicated subsection in the Experimental Setup that reports the QUT-DV25 dataset size, collection window and methodology, labeling source and process, and the train/test split strategy (including stratification and ratio). revision: yes

  2. Referee: It is not stated whether feature selection (the process that halves dimensionality) was performed inside or outside the cross-validation loop. If performed on the full dataset, the reported performance deltas and stability results are at risk of optimistic bias and may not reflect true out-of-sample behavior on the sparsity profile of QUT-DV25.

    Authors: We appreciate the concern about potential leakage. Feature selection was performed inside the cross-validation loop on training folds only. The revised manuscript explicitly states this in the Feature Selection subsection, describes the method used, and reports the average dimensionality reduction observed across folds. revision: yes

  3. Referee: The state-of-the-art baselines used for comparison are not described in sufficient detail (implementation, hyper-parameters, or exact experimental conditions), preventing verification of the claimed 3% accuracy improvement and the 82%/79% FP/FN reductions.

    Authors: We agree that additional detail is required for verification. The revised manuscript expands the Baselines subsection to include implementation details (libraries and versions), hyper-parameter settings, and the exact experimental conditions (same splits and preprocessing) under which the baselines were run. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical performance claims rest on dataset evaluation

full rationale

The paper presents eDySec as a DL framework evaluated on the QUT-DV25 dataset for malicious package detection, reporting empirical metrics such as halved feature dimensionality, 82% lower false positives, 79% lower false negatives, 3% accuracy improvement, near-perfect stability, and 170ms latency. No mathematical derivation chain, equations, or self-referential definitions are present in the provided text. Performance claims are framed as experimental outcomes from model training and testing rather than predictions derived from fitted parameters or self-citations that reduce to the inputs by construction. The central results depend on external dataset evaluation and comparisons to SOTA, which are falsifiable and not tautological.

Axiom & Free-Parameter Ledger

2 free parameters · 1 axioms · 0 invented entities

The work applies established deep learning and XAI methods to a security dataset without introducing new mathematical constructs or entities. The main dependencies are on the dataset and standard ML assumptions.

free parameters (2)
  • Deep learning model hyperparameters
    Tuned to optimize detection performance on the QUT-DV25 dataset as part of the evaluation.
  • Feature selection criteria
    Determines which behavioral attributes are most discriminative, leading to the reported dimensionality reduction.
axioms (1)
  • domain assumption High-dimensional sparse dynamic behavioral data from packages can be effectively modeled by deep learning for malicious detection.
    This underpins the choice of DL over traditional ML as stated in the abstract.

pith-pipeline@v0.9.0 · 5620 in / 1424 out tokens · 109171 ms · 2026-05-07T13:32:37.334890+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

70 extracted references · 33 canonical work pages · 1 internal anchor

  1. [1]

    Hubert Baniecki and Przemyslaw Biecek. 2024. Adversarial Attacks and Defenses in Explainable Artificial Intelligence: A Survey.Information Fusion107 (2024), 102303. doi:10.1016/j.inffus.2024.102303

  2. [2]

    Alessio Benavoli, Giorgio Corani, Janez Demšar, and Marco Zaffalon. 2017. Time for a Change: A Tutorial for Comparing Multiple Classifiers Through Bayesian Analysis.Journal of Machine Learning Research18, 77 (2017), 1–36

  3. [3]

    2026.litellm

    BerriAI. 2026.litellm. https://pypi.org/project/litellm/ Latest version 1.83.4; Accessed: Apr. 9, 2026

  4. [4]

    2025.Attackers Adopt Novel LOTL Techniques to Evade De- tection

    James Coker. 2025.Attackers Adopt Novel LOTL Techniques to Evade De- tection. https://www.infosecurity-magazine.com/news/attackers-novel-lotl- detection/ Accessed: Apr. 9, 2026

  5. [5]

    n.d..Open Source Security

    Cybersecurity and Infrastructure Security Agency. n.d..Open Source Security. https://www.cisa.gov/opensource Accessed: Apr. 9, 2026

  6. [6]

    DataDog. 2023. GuardDog. https://github.com/DataDog/guarddog. CLI tool to identify malicious packages in PyPI and npm

  7. [7]

    Janez Demšar. 2006. Statistical Comparisons of Classifiers over Multiple Data Sets.Journal of Machine Learning Research7 (2006), 1–30

  8. [8]

    Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-Training of Deep Bidirectional Transformers for Language Understanding. InProceedings of the 2019 Conference of the North American Chapter of the Associ- ation for Computational Linguistics: Human Language Technologies (NAACL-HLT). Association for Computational Linguistics, ...

  9. [9]

    Rotem Dror, Gili Baumer, Segev Shlomov, and Roi Reichart. 2018. The Hitchhiker’s Guide to Statistical Significance in Natural Language Processing. InProceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Association for Computational Linguistics, Melbourne, Australia, 1383–1392. doi:10.18653/v1/P18-1128

  10. [10]

    Jeffrey L. Elman. 1990. Finding Structure in Time.Cognitive Science14, 2 (1990), 179–211. doi:10.1207/s15516709cog1402_1

  11. [11]

    2024.Malware Package Analysis: aiocpa

    Mike Fiedler. 2024.Malware Package Analysis: aiocpa. https://blog.pypi.org/ posts/2024-11-25-aiocpa-attack-analysis/ Accessed: Apr. 9, 2026

  12. [12]

    Ronald A. Fisher. 1925.Statistical Methods for Research Workers. Oliver and Boyd, Edinburgh and London. https://onlinebooks.library.upenn.edu/webbin/book/ lookupid?key=olbp89164

  13. [13]

    2025.Global Costs of Software Supply Chain Attacks On The Rise

    Taylor Fox. 2025.Global Costs of Software Supply Chain Attacks On The Rise. https://cybersecurityventures.com/global-costs-of-software-supply-chain- attacks-on-the-rise/ Accessed: Apr. 9, 2026

  14. [14]

    Mit Gandhi, Mehek Patel, Harsh Bhadra, and Shubham Verma. 2024. Automated Detection and Mitigation of Malicious Packages in the PyPI Ecosystem and .exe Files: PyGuardEX. In2024 First International Conference for Women in Computing (InCoWoCo). IEEE, Pune, Maharashtra, India, 1–9. doi:10.1109/InCoWoCo64194. 2024.10863220

  15. [15]

    Kai Gao, Weiwei Xu, Wenhao Yang, and Minghui Zhou. 2024. PyRadar: Towards Automatically Retrieving and Validating Source Code Repository Information for PyPI Packages.Proceedings of the ACM on Software Engineering1, FSE (2024), 2608–2631. doi:10.1145/3660822

  16. [16]

    Xingan Gao, Xiaobing Sun, Sicong Cao, Kaifeng Huang, Di Wu, Xiaolei Liu, Xingwei Lin, and Yang Xiang. 2025. MalGuard: Towards Real-Time, Accurate, and Actionable Detection of Malicious Packages in PyPI Ecosystem. In34th USENIX Security Symposium (USENIX Security 25). USENIX Association, Seattle, WA, USA, 4741–4758. https://www.usenix.org/conference/usenix...

  17. [17]

    Salvador García, Alberto Fernández, Julián Luengo, and Francisco Herrera

  18. [18]

    Advanced nonparametric tests for multiple comparisons in the design of experiments in computational intelligence and data mining: Experimental analysis of power

    Advanced Nonparametric Tests for Multiple Comparisons in the De- sign of Experiments in Computational Intelligence and Data Mining: Exper- imental Analysis of Power.Information Sciences180, 10 (2010), 2044–2064. doi:10.1016/j.ins.2009.12.010

  19. [19]

    Damien Garreau and Ulrike von Luxburg. 2020. Explaining the Explainer: A First Theoretical Analysis of LIME. InProceedings of the 23rd International Conference on Artificial Intelligence and Statistics. PMLR, Palermo, Sicily, Italy, 1287–1296

  20. [20]

    2021.Securing the Open Source Supply Chain by Scanning for Package Registry Credentials

    Annie Gesellchen. 2021.Securing the Open Source Supply Chain by Scanning for Package Registry Credentials. https://github.blog/security/supply-chain-security/ securing-open-source-supply-chain-scanning-package-registry-credentials/ Updated: Jul. 2, 2021; Accessed: Apr. 9, 2026

  21. [21]

    2016.Deep Learning

    Ian Goodfellow, Yoshua Bengio, and Aaron Courville. 2016.Deep Learning. MIT Press, Cambridge, MA. http://www.deeplearningbook.org

  22. [22]

    2026.Cutting the Gordian Knot: Detecting Mali- cious PyPI Packages via a Knowledge-Mining Framework

    Wenbo Guo, Chengwei Liu, Ming Kang, Yiran Zhang, Jiahui Wu, Zhengzi Xu, Vinay Sachidananda, and Yang Liu. 2026.Cutting the Gordian Knot: Detecting Mali- cious PyPI Packages via a Knowledge-Mining Framework. arXiv:2601.16463 [cs.SE] doi:10.48550/arXiv.2601.16463

  23. [23]

    2024.PackageIntel: Leveraging Large Language Models for Automated Intelligence Extraction in Package Ecosystems

    Wenbo Guo, Chengwei Liu, Limin Wang, Jiahui Wu, Zhengzi Xu, Cheng Huang, Yong Fang, and Yang Liu. 2024.PackageIntel: Leveraging Large Language Models for Automated Intelligence Extraction in Package Ecosystems. arXiv:2409.15049 [cs.SE] https://arxiv.org/abs/2409.15049

  24. [24]

    Wenbo Guo, Zhengzi Xu, Chengwei Liu, Cheng Huang, Yong Fang, and Yang Liu

  25. [25]

    InProceedings of the 38th IEEE/ACM International Conference on Automated Software Engineering (ASE)

    An Empirical Study of Malicious Code in PyPI Ecosystem. InProceedings of the 38th IEEE/ACM International Conference on Automated Software Engineering (ASE). IEEE, Luxembourg City, Luxembourg, 166–177. doi:10.1109/ASE56229. 2023.00027

  26. [26]

    Isabelle Guyon and André Elisseeff. 2003. An Introduction to Variable and Feature Selection.Journal of Machine Learning Research3 (2003), 1157–1182. doi:10.1162/153244303322753616

  27. [27]

    Rafiqul Islam, Md

    Sajal Halder, Michael Bewong, Arash Mahboubi, Yinhao Jiang, Md. Rafiqul Islam, Md. Zahidul Islam, Ryan H. L. Ip, Muhammad Ejaz Ahmed, Gowri Sankar Ra- machandran, and Muhammad Ali Babar. 2024. Malicious Package Detection using Metadata Information. InProceedings of the ACM Web Conference 2024 (WWW ’24). Association for Computing Machinery, New York, NY, U...

  28. [28]

    Mark A. Hall. 1999.Correlation-Based Feature Selection for Machine Learning. Ph. D. Dissertation. The University of Waikato, Hamilton, New Zealand. https: //www.cs.waikato.ac.nz/ml/publications/1999/99MH-Thesis.pdf

  29. [29]

    Sepp Hochreiter and Jürgen Schmidhuber. 1997. Long Short-Term Memory. Neural Computation9, 8 (1997), 1735–1780

  30. [30]

    Cheng Huang, Nannan Wang, Ziyan Wang, Siqi Sun, Lingzi Li, Junren Chen, Qianchong Zhao, Jiaxuan Han, Zhen Yang, and Lei Shi. 2024. DONAPI: Malicious NPM Packages Detector using Behavior Sequence Knowledge Mapping. In33rd USENIX Security Symposium (USENIX Security 24). USENIX Association, Philadel- phia, PA, USA, 3765–3782. https://www.usenix.org/conferenc...

  31. [31]

    Tahir Iqbal, Guowei Wu, Zahid Iqbal, and Muhammad Bilal Mahmood. 2026. CLAMPD-Net: Cross-Language Malicious Package Detection across PyPI and NPM with Multimodal Fusion.Information and Software Technology177 (2026), 108129. doi:10.1016/j.infsof.2026.108129

  32. [32]

    Tahir Iqbal, Guowei Wu, Zahid Iqbal, Muhammad Bilal Mahmood, Amreen Shafique, and Wenbo Guo. 2025. PypiGuard: A Novel Meta-Learning Approach for Enhanced Malicious Package Detection in PyPI through Static-Dynamic Feature Fusion.Journal of Information Security and Applications90 (2025), 104032. doi:10.1016/j.jisa.2025.104032

  33. [33]

    Raisul Islam, Mohammad Motiur Rahman, Sk

    Md. Raisul Islam, Mohammad Motiur Rahman, Sk. Tanzir Mehedi, Abdullah Nazib, Rafiqul Islam, and Ziaur Rahman. 2023. A Modified Feature Selection Algorithm for Intrusion Detection System Based on Student Psychology-Based Optimization with Explainable AI. In2023 IEEE Asia-Pacific Conference on Computer Science and Data Engineering (CSDE). IEEE, Fiji, 1–6. d...

  34. [34]

    Eberhart

    James Kennedy and Russell C. Eberhart. 1995. Particle Swarm Optimization. In Proceedings of ICNN’95 — International Conference on Neural Networks, Vol. 4. IEEE, Perth, WA, Australia, 1942–1948. https://www.cs.tufts.edu/comp/150GA/ homeworks/hw3/_reading6%201995%20particle%20swarming.pdf

  35. [35]

    Piergiorgio Ladisa, Serena Elisa Ponta, Nicola Ronzoni, Matias Martinez, and Olivier Barais. 2023. On the Feasibility of Cross-Language Detection of Malicious Packages in npm and PyPI. InProceedings of the 39th Annual Computer Security Applications Conference (ACSAC ’23). Association for Computing Machinery, New York, NY, USA, 71–82. doi:10.1145/3627106.3627138

  36. [36]

    Yann LeCun, Yoshua Bengio, and Geoffrey Hinton. 2015. Deep Learning.Nature 521, 7553 (2015), 436–444

  37. [37]

    Yann LeCun, Leon Bottou, Yoshua Bengio, and Patrick Haffner. 1998. Gradient- Based Learning Applied to Document Recognition.Proc. IEEE86, 11 (1998), 2278–2324

  38. [38]

    Lundberg and Su-In Lee

    Scott M. Lundberg and Su-In Lee. 2017. A Unified Approach to In- terpreting Model Predictions. InAdvances in Neural Information Process- ing Systems 30, I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R. Fer- gus, S. Vishwanathan, and R. Garnett (Eds.). Curran Associates, Inc., Long Beach, CA, USA, 4765–4774. https://proceedings.neurips.cc/paper/2017/hash...

  39. [39]

    Experimental Analysis of Trustworthy In- Vehicle Intrusion Detection System Using eXplainable Artificial Intel- ligence (XAI)

    Harikha Manthena, Shaghayegh Shajarian, Jeffrey C. Kimmell, Mahmoud Ab- delsalam, Sajad Khorsandroo, and Maanak Gupta. 2025. Explainable Artificial Intelligence (XAI) for Malware Analysis: A Survey of Techniques, Applications, and Open Challenges.IEEE Access13 (2025), 61611–61640. doi:10.1109/ACCESS. 2025.3555926

  40. [40]

    Tanzir Mehedi, Adnan Anwar, Ziaur Rahman, Kawsar Ahmed, and Rafiqul Islam

    Sk. Tanzir Mehedi, Adnan Anwar, Ziaur Rahman, Kawsar Ahmed, and Rafiqul Islam. 2023. Dependable Intrusion Detection System for IoT: A Deep Transfer Learning Based Approach.IEEE Transactions on Industrial Informatics19, 1 (2023), 1006–1017. doi:10.1109/TII.2022.3164770

  41. [41]

    Sk Tanzir Mehedi, Chadni Islam, Gowri Ramachandran, and Raja Jurdak. 2026. DySec: A Machine Learning-Based Dynamic Analysis for Detecting Malicious Packages in PyPI Ecosystem.IEEE Transactions on Information Forensics and Security21 (2026), 1316–1331. doi:10.1109/TIFS.2026.3654388

  42. [42]

    Sk Tanzir Mehedi, Raja Jurdak, Chadni Islam, and Gowri Ramachandran. 2025. QUT-DV25: A Dataset for Dynamic Analysis of Next-Gen Software Supply Chain Attacks. InAdvances in Neural Information Processing Systems 39. Curran As- sociates, Inc., San Diego, CA, USA. https://neurips.cc/virtual/2025/loc/san- diego/poster/121753 Datasets and Benchmarks Track

  43. [43]

    2023.Cybersecurity Venture’s 2023 Software Supply Chain At- tack Report

    Sydney Milligan. 2023.Cybersecurity Venture’s 2023 Software Supply Chain At- tack Report. https://snyk.io/blog/cybersecurity-ventures-2023-software-supply- chain-attack-report/ Accessed: Apr. 9, 2026. Conference Name, Month Year, City, Country Tanzir et al

  44. [44]

    Seyedali Mirjalili and Andrew Lewis. 2016. The Whale Optimization Algorithm. Advances in Engineering Software95 (2016), 51–67. doi:10.1016/j.advengsoft.2016. 01.008

  45. [45]

    Marc Ohm, Henrik Plate, Arnold Sykosch, and Michael Meier. 2020. Backstabber’s Knife Collection: A Review of Open Source Software Supply Chain Attacks. InDetection of Intrusions and Malware, and Vulnerability Assessment (DIMV A) (Lecture Notes in Computer Science, Vol. 12223), Clémentine Maurice, Marc Heuse, Mauro Conti, and Gianluca Dini (Eds.). Springer...

  46. [46]

    2025.Malicious PyPI Pack- ages Deliver SilentSync RAT

    Manisha Ramcharan Prajapati and Satyam Singh. 2025.Malicious PyPI Pack- ages Deliver SilentSync RAT. https://www.zscaler.com/blogs/security-research/ malicious-pypi-packages-deliver-silentsync-rat Accessed: Apr. 9, 2026

  47. [47]

    2026.EVNextTrade: Learning-to-Rank-Based Recommendation of Next Charging Nodes for EV-EV Energy Trading

    Md Mahfujur Rahman, Alistair Barros, Raja Jurdak, and Darshika Koggalahewa. 2026.EVNextTrade: Learning-to-Rank-Based Recommendation of Next Charging Nodes for EV-EV Energy Trading. arXiv:2603.26688 [cs.IR] doi:10.48550/arXiv.2603. 26688

  48. [48]

    Why Should I Trust You?

    Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. 2016. “Why Should I Trust You?”: Explaining the Predictions of Any Classifier. InProceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, San Francisco, CA, USA, 1135–1144. doi:10.1145/2939672.2939778

  49. [49]

    Gaith Rjoub, Jamal Bentahar, Omar Abdel Wahab, Rabeb Mizouni, Alyssa Song, Robin Cohen, Hadi Otrok, and Azzam Mourad. 2023. A Survey on Explainable Artificial Intelligence for Cybersecurity.IEEE Transactions on Network and Service Management20, 4 (2023), 5115–5140

  50. [50]

    Learning representations by back-propagating errors.Nature1986,323, 533–536

    David E. Rumelhart, Geoffrey E. Hinton, and Ronald J. Williams. 1986. Learning Representations by Back-Propagating Errors.Nature323, 6088 (1986), 533–536. doi:10.1038/323533a0

  51. [51]

    Khuloud Saeed Alketbi and Abid Mehmood. 2025. A Comprehensive Survey of Explainable Artificial Intelligence Techniques for Malicious Insider Threat Detection.IEEE Access13 (2025), 121772–121798. doi:10.1109/ACCESS.2025. 3587114

  52. [52]

    2026.Fake Grok API Wrapper Deploys New Malware

    Safety Research Team. 2026.Fake Grok API Wrapper Deploys New Malware. https://www.getsafety.com/blog-posts/grokwrapper Accessed: Apr. 9, 2026

  53. [53]

    Haya Samaana, Diego Elias Costa, Emad Shihab, and Ahmad Abdellatif. 2025. A Machine Learning-Based Approach for Detecting Malicious PyPI Packages. InProceedings of the 40th ACM/SIGAPP Symposium on Applied Computing (SAC ’25). Association for Computing Machinery, New York, NY, USA, 1617–1626. doi:10.1145/3672608.3707756

  54. [54]

    Victor Sanh, Lysandre Debut, Julien Chaumond, and Thomas Wolf. 2019. DistilBERT, a Distilled Version of BERT: Smaller, Faster, Cheaper and Lighter. arXiv:1910.01108 [cs.CL] https://arxiv.org/abs/1910.01108

  55. [55]

    2025.Malware Targeting Developers Reaches 845K Packages According to Sonatype Open Source Malware Index

    Sonatype. 2025.Malware Targeting Developers Reaches 845K Packages According to Sonatype Open Source Malware Index. https://www.sonatype.com/press-releases/ q2-2025-open-source-malware-index Accessed: Apr. 18, 2026

  56. [56]

    2026.Compromised litellm PyPI Package Deliv- ers Multi-Stage Credential Stealer

    Sonatype Security Research Team. 2026.Compromised litellm PyPI Package Deliv- ers Multi-Stage Credential Stealer. https://www.sonatype.com/blog/compromised- litellm-pypi-package-delivers-multi-stage-credential-stealer Accessed: Apr. 9, 2026

  57. [57]

    Xiaobing Sun, Xingan Gao, Sicong Cao, Lili Bo, Xiaoxue Wu, and Kaifeng Huang

  58. [58]

    InProceedings of the 39th IEEE/ACM International Conference on Automated Software Engineering (ASE ’24)

    1+1 >2: Integrating Deep Code Behaviors with Metadata Features for Mali- cious PyPI Package Detection. InProceedings of the 39th IEEE/ACM International Conference on Automated Software Engineering (ASE ’24). Association for Com- puting Machinery, New York, NY, USA, 1159–1170. doi:10.1145/3691620.3695493

  59. [59]

    Zhuoran Tan, Wenbo Guo, Taylor Brierley, Jiewen Luo, Jeremy Singer, and Christos Anagnostopoulos. 2026. SynthChain: A Synthetic Benchmark and Forensic Analysis of Advanced and Stealthy Software Supply Chain Attacks. arXiv:2603.16694 [cs.CR] doi:10.48550/arXiv.2603.16694

  60. [60]

    Guy Van den Broeck, Anton Lykov, Maximilian Schleich, and Dan Suciu. 2022. On the Tractability of SHAP Explanations.Journal of Artificial Intelligence Research 74 (2022), 851–886

  61. [61]

    Gomez, Łukasz Kaiser, and Illia Polosukhin

    Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention Is All You Need. InAdvances in Neural Information Processing Systems 30, I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett (Eds.). Curran Associates, Inc., Long Beach, CA, US...

  62. [62]

    Duc-Ly Vu, Zachary Newman, and John Speed Meyers. 2023. Bad Snakes: Under- standing and Improving Python Package Index Malware Scanning. InProceedings of the 45th IEEE/ACM International Conference on Software Engineering. IEEE, Melbourne, Australia, 499–511. doi:10.1109/ICSE48619.2023.00052

  63. [63]

    Chi Wang, Qingyun Wu, Markus Weimer, and Erkang Zhu. 2021. FLAML: A Fast and Lightweight AutoML Library. InProceedings of Machine Learning and Systems (MLSys), Vol. 3. MLSys, Online, 434–447. https://proceedings.mlsys.org/paper_ files/paper/2021/hash/1ccc3bfa05cb37b917068778f3c4523a-Abstract.html

  64. [64]

    Jiale Yan and Bo Zhao. 2026. PyPIMalDet: A Malicious PyPI Package Detection Method Combining Code Features and Metadata Features.Neural Networks197 (2026), 108487. doi:10.1016/j.neunet.2025.108487

  65. [65]

    Nusrat Zahan, Philipp Burckhardt, Mikola Lysenko, Feross Aboukhadijeh, and Laurie Williams. 2024. MalwareBench: Malware Samples Are Not Enough. In Proceedings of the 21st IEEE/ACM International Conference on Mining Software Repositories. ACM, Lisbon, Portugal, 728–732. doi:10.1145/3643991.3644883

  66. [66]

    Junan Zhang, Kaifeng Huang, Yiheng Huang, Bihuan Chen, Ruisi Wang, Chong Wang, and Xin Peng. 2025. Killing Two Birds with One Stone: Malicious Package Detection in NPM and PyPI using a Single Model of Malicious Behavior Sequence. ACM Transactions on Software Engineering and Methodology34, 4 (2025), 104:1– 104:28. doi:10.1145/3705304

  67. [67]

    Xinyi Zheng, Chen Wei, Shenao Wang, Yanjie Zhao, Peiming Gao, Yuanchao Zhang, Kailong Wang, and Haoyu Wang. 2024. Towards Robust Detection of Open Source Software Supply Chain Poisoning Attacks in Industry Environments. In Proceedings of the 39th IEEE/ACM International Conference on Automated Software Engineering (ASE ’24). Association for Computing Machi...

  68. [68]

    An Analysis of Malicious Packages in Open- Source Software in the Wild

    Xiaoyan Zhou, Ying Zhang, Wenjia Niu, Jiqiang Liu, Haining Wang, and Qiang Li. 2024. OSS-MPAW Artifact Repository. https://github.com/Rebecia/OSS- MPAW. Artifact repository for “An Analysis of Malicious Packages in Open- Source Software in the Wild”

  69. [69]

    Markus Zimmermann, Cristian-Alexandru Staicu, Cam Tenny, and Michael Pradel

  70. [70]

    InProceedings of the 28th USENIX Security Symposium(Santa Clara, CA, USA)(SEC’19)

    Smallworld with High Risks: A Study of Security Threats in the npm Ecosystem. InProceedings of the 28th USENIX Security Symposium(Santa Clara, CA, USA)(SEC’19). USENIX Association, USA, 995–1010. eDySec: A Deep Learning-based Explainable Dynamic Analysis Framework for Detecting Malicious Packages in PyPI Ecosystem Conference Name, Month Year, City, Countr...