MOLOT System Card: Malicious Operational Logic Observation Transformer
Pith reviewed 2026-06-27 21:37 UTC · model grok-4.3
The pith
MOLOT detects malicious packages by modeling behavior sequences from static call graphs and mapping suspicious activities back to source locations.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
MOLOT represents source code as behavior sequences derived from static call graphs, classifies them via a transformer to separate malicious from benign packages, and supplies explanations by ranking suspicious behavior activities and mapping them to concrete source-code locations. On PyPI and npm data the system meets accuracy, runtime, memory, and false-positive targets observed in production moderation workflows.
What carries the argument
Behavior sequences extracted from static call graphs, processed by the Malicious Operational Logic Observation Transformer, together with a ranking-based explanation stage that links flagged activities back to source locations.
If this is right
- Detection becomes possible inside SAST tools that lack package metadata or execution traces.
- Explanations can directly support human review by pointing to the exact code locations driving the flag.
- The released Open Malicious-Code Bench supplies a shared test set for comparing future static detectors.
- Performance under measured runtime and memory limits allows direct insertion into existing DevSecOps moderation queues.
Where Pith is reading between the lines
- The same sequence representation could be combined with dynamic traces when those become available, potentially raising detection rates further.
- Extending the call-graph extraction to additional languages would test whether the approach generalizes beyond Python and JavaScript.
- The public benchmark may encourage development of lighter-weight models that still retain the explanation feature.
- If the sequences prove stable across package versions, the method could support continuous monitoring of supply-chain updates.
Load-bearing premise
Behavior sequences taken from static call graphs alone are enough to tell malicious packages from benign ones when metadata and dynamic traces cannot be used.
What would settle it
A production run on newly arriving PyPI or npm packages in which MOLOT either misses confirmed malicious samples or exceeds the false-positive rate tolerated by the moderation team.
Figures
read the original abstract
MOLOT (Malicious Operational Logic Observation Transformer) is a static malicious-code detection system designed for SAST setup where package metadata, maintainer history, and dynamic execution traces may be unavailable or unreliable. The system represents source code as behavior sequences derived from static call graphs, includes an explanation stage that ranks suspicious behavior activities and maps them back to source-code locations. The approach is evaluated on Python and JavaScript packages from PyPI and npm, compared with opensource detection tools, and validated under product constraints including runtime, memory use, and false-positive rates observed in a real moderation workflow. We also release Open Malicious-Code Bench, a public benchmark for reproducible evaluation of malicious-package detection methods. The results show that static behavior-sequence modeling can provide accurate, explainable, and deployable malicious-code detection for modern DevSecOps workflows.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. MOLOT is a static malicious-code detection system for SAST scenarios where package metadata, maintainer history, and dynamic traces are unavailable. It represents code as behavior sequences from static call graphs, includes an explanation stage that ranks suspicious activities and maps them to source locations, evaluates on Python/JS packages from PyPI and npm against open-source tools, validates under product constraints (runtime, memory, false-positive rates in a real moderation workflow), and releases the Open Malicious-Code Bench benchmark. The central claim is that static behavior-sequence modeling yields accurate, explainable, and deployable detection for modern DevSecOps.
Significance. If the evaluation results hold, the work supplies a practical static-analysis method for malicious-package detection when dynamic execution or metadata are unreliable, together with a public benchmark that enables reproducible comparison. The release of Open Malicious-Code Bench is a concrete contribution to the field.
major comments (1)
- [call-graph construction] § on call-graph construction: the central claim requires that behavior sequences extracted solely from static call graphs suffice for accurate detection when metadata and dynamic traces are unavailable. In Python/JS, common malicious patterns (runtime eval/exec, __import__ indirection, string-based dispatch, or minified/obfuscated call sites) produce incomplete or identical static graphs for malicious and benign code. If the Open Malicious-Code Bench or the reported experiments do not contain a representative fraction of such cases, or if the sequence extraction collapses these patterns, the accuracy and deployability results do not generalize to the stated threat model.
minor comments (1)
- [Abstract] Abstract: asserts evaluation results and deployability but supplies no quantitative metrics, baselines, dataset sizes, or methodology details; these should be summarized even at the abstract level for a system card.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback and the recommendation for major revision. We address the single major comment point-by-point below.
read point-by-point responses
-
Referee: [call-graph construction] § on call-graph construction: the central claim requires that behavior sequences extracted solely from static call graphs suffice for accurate detection when metadata and dynamic traces are unavailable. In Python/JS, common malicious patterns (runtime eval/exec, __import__ indirection, string-based dispatch, or minified/obfuscated call sites) produce incomplete or identical static graphs for malicious and benign code. If the Open Malicious-Code Bench or the reported experiments do not contain a representative fraction of such cases, or if the sequence extraction collapses these patterns, the accuracy and deployability results do not generalize to the stated threat model.
Authors: We agree that this is a central validity concern for any static-analysis claim. The Open Malicious-Code Bench was deliberately seeded with malicious packages exhibiting runtime eval/exec, __import__ indirection, string-based dispatch, and minification/obfuscation drawn from real PyPI and npm incidents; the static extractor incorporates name-resolution and limited constant-propagation heuristics to recover indirect targets where possible. Nevertheless, we acknowledge that highly adversarial obfuscation can still produce incomplete or colliding graphs. We will revise the manuscript to (1) report detection metrics stratified by obfuscation level within the benchmark, (2) add an explicit limitations subsection quantifying the fraction of samples where static graphs become indistinguishable, and (3) clarify the precise threat model under which the reported accuracy and false-positive rates are claimed to hold. These changes will be reflected in the next version. revision: yes
Circularity Check
No significant circularity in derivation chain
full rationale
The paper describes a static detection system using behavior sequences from call graphs, with evaluation on external PyPI/npm packages and comparison to open-source tools. No equations, fitted parameters renamed as predictions, self-definitional steps, or load-bearing self-citations are present in the abstract or description. Claims rest on external benchmarks and product constraints rather than internal reductions to inputs.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
Security update: Suspected supply chain incident,
K. Dholakia and I. Jaffer, “Security update: Suspected supply chain incident,” LiteLLM Blog, Mar. 2026. [Online]. Available: https://docs.litellm.ai/blog/security-update-march-2026
2026
-
[2]
The Shai-Hulud 2.0 NPM worm: analysis, and what you need to know,
C. Tafani-Dereeper and S. Obregoso, “The Shai-Hulud 2.0 NPM worm: analysis, and what you need to know,” Datadog Security Labs, Nov
-
[3]
Available: https://securitylabs.datadoghq.com/articles/ shai-hulud-2.0-npm-worm/
[Online]. Available: https://securitylabs.datadoghq.com/articles/ shai-hulud-2.0-npm-worm/
-
[4]
J. Zhang, K. Huang, Y . Huang, B. Chen, R. Wang, C. Wang, and X. Peng, “Killing two birds with one stone: Malicious package detection in NPM and PyPI using a single model of malicious behavior sequence,”ACM Transactions on Software Engineering and Methodology, 2024, arXiv:2309.02637. [Online]. Available: https://dl.acm.org/doi/full/10.1145/3705304
work page internal anchor Pith review Pith/arXiv arXiv doi:10.1145/3705304 2024
-
[5]
1+1¿2: Integrating deep code behaviors with metadata features for malicious PyPI package detection,
X. Sun, X. Gao, S. Cao, L. Bo, X. Wu, and K. Huang, “1+1¿2: Integrating deep code behaviors with metadata features for malicious PyPI package detection,” inProceedings of the 39th IEEE/ACM International Conference on Automated Software Engineering (ASE), 2024. [Online]. Available: https://dl.acm.org/doi/10. 1145/3691620.3695493
arXiv 2024
-
[6]
CLAMPD-Net: Cross-language malicious package detection across PyPI and NPM with multimodal fusion,
T. Iqbal, G. Wu, and Z. Iqbal, “CLAMPD-Net: Cross-language malicious package detection across PyPI and NPM with multimodal fusion,” Information and Software Technology, 2026. [Online]. Available: https: //www.sciencedirect.com/science/article/abs/pii/S0950584926001187
2026
-
[7]
Y . Huang, R. Wang, W. Zheng, Z. Zhou, S. Wu, S. Ke, B. Chen, S. Gao, and X. Peng, “SpiderScan: Practical detection of malicious NPM packages based on graph-based behavior modeling and matching,” inProceedings of the 39th IEEE/ACM International Conference on Automated Software Engineering (ASE), 2024. [Online]. Available: https://dl.acm.org/doi/10.1145/36...
-
[8]
DySec: A machine learning-based dynamic analysis for detecting malicious packages in PyPI ecosystem,
S. T. Mehedi, C. Islam, G. Ramachandran, and R. Jurdak, “DySec: A machine learning-based dynamic analysis for detecting malicious packages in PyPI ecosystem,” 2025. [Online]. Available: https://arxiv.org/abs/2503.00324
arXiv 2025
-
[9]
Detecting malicious source code in PyPI packages with LLMs: Does RAG come in handy?
M. Ibiyo, T. Louangdy, P. T. Nguyen, C. Di Sipio, and D. Di Ruscio, “Detecting malicious source code in PyPI packages with LLMs: Does RAG come in handy?” 2025. [Online]. Available: https: //arxiv.org/abs/2504.13769
arXiv 2025
-
[10]
Leveraging large language models to detect NPM malicious packages,
N. Zahan, P. Burckhardt, M. Lysenko, F. Aboukhadijeh, and L. Williams, “Leveraging large language models to detect NPM malicious packages,” inProceedings of the IEEE/ACM 47th International Conference on Software Engineering (ICSE), 2025. [Online]. Available: https://arxiv.org/abs/2403.12196
arXiv 2025
-
[11]
CodeQL: Semantic code analysis engine,
GitHub, Inc., “CodeQL: Semantic code analysis engine,” GitHub repository. [Online]. Available: https://github.com/github/codeql
-
[12]
Semgrep: Lightweight static analysis for many languages,
Semgrep, Inc., “Semgrep: Lightweight static analysis for many languages,” GitHub repository, 2024. [Online]. Available: https: //github.com/semgrep/semgrep
2024
-
[13]
bandit4mal: A fork of Bandit with patterns to identify malicious Python code,
D.-L. Vu, “bandit4mal: A fork of Bandit with patterns to identify malicious Python code,” GitHub repository. [Online]. Available: https://github.com/lyvd/bandit4mal
-
[14]
Understanding NPM malicious package detection: A benchmark-driven empirical analysis,
W. Guo, Z. Chen, Z. Xu, C. Liu, M. Kang, S. Song, C. Liu, Y . Xu, W. Sun, and Y . Liu, “Understanding NPM malicious package detection: A benchmark-driven empirical analysis,” 2026. [Online]. Available: https://arxiv.org/abs/2603.27549
arXiv 2026
-
[15]
A. Ryan, J. M. Ifti, M. Erfan, A. A. U. Rahman, and M. R. Rahman, “Unveiling malicious logic: Towards a statement-level taxonomy and dataset for securing Python packages,” 2025. [Online]. Available: https://arxiv.org/abs/2512.12559
arXiv 2025
-
[16]
On the feasibility of cross-language detection of malicious packages in NPM and PyPI,
P. Ladisa, S. E. Ponta, N. Ronzoni, M. Martinez, and O. Barais, “On the feasibility of cross-language detection of malicious packages in NPM and PyPI,” inProceedings of the 39th Annual Computer Security Applications Conference (ACSAC), 2023. [Online]. Available: https://arxiv.org/abs/2310.09571
arXiv 2023
-
[17]
OSSGadget: Collection of tools for analyzing open source packages,
Microsoft, “OSSGadget: Collection of tools for analyzing open source packages,” GitHub repository. [Online]. Available: https: //github.com/microsoft/OSSGadget
-
[18]
Application inspector: A source-code analyzer for surveying features,
——, “Application inspector: A source-code analyzer for surveying features,” GitHub repository. [Online]. Available: https://github.com/ microsoft/ApplicationInspector
-
[19]
A benchmark comparison of Python malware detection approaches,
D.-L. Vu, Z. Newman, and J. S. Meyers, “A benchmark comparison of Python malware detection approaches,” 2022. [Online]. Available: https://arxiv.org/abs/2209.13288
arXiv 2022
-
[20]
malicious-code-ruleset: Semgrep rules for detecting malicious code in OSS packages,
Apiiro Ltd., “malicious-code-ruleset: Semgrep rules for detecting malicious code in OSS packages,” GitHub repository. [Online]. Available: https://github.com/apiiro/malicious-code-ruleset APPENDIXA LEAKAGE OFFILEIDENTIFIERS INEARLYACTIVITY CHAINS Symptom.In early versions of the pipeline, activity chains contained the entrypoint identifier — specifically,...
2025
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.