PRISM: PE Relational Inter-Section Matrix. A 2D Section-Aware Dataset for Static PE Malware Detection

Ana I. Gonz\'alez-Tablas; Jos\'e M. Sacrist\'an

arxiv: 2606.27109 · v1 · pith:MK2YNEGEnew · submitted 2026-06-25 · 💻 cs.CR

PRISM: PE Relational Inter-Section Matrix. A 2D Section-Aware Dataset for Static PE Malware Detection

Jos\'e M. Sacrist\'an , Ana I. Gonz\'alez-Tablas This is my paper

Pith reviewed 2026-06-26 03:29 UTC · model grok-4.3

classification 💻 cs.CR

keywords malware detectionPE filesstatic analysissection ordering2D representationEMBER comparisonWindows binariesfeature matrix

0 comments

The pith

PRISM's ordered 2D matrix of PE sections recovers nearly all EMBER detection performance at one-sixth the size.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents PRISM as a 2D matrix that represents each Windows PE file by its sections in their original file order, plus one summary row. It uses separability measures to show that keeping this positional structure captures information flat one-dimensional vectors lose. In direct head-to-head tests on matched samples, the same gradient-boosted classifier reaches almost the same malware-versus-benign accuracy with the compact PRISM features as it does with the much larger EMBER vector. The work states that the basic detection task is already saturated and therefore saves the extra structural detail for harder problems such as family identification.

Core claim

PRISM encodes every PE binary as a two-dimensional matrix whose rows are the individual sections in file order together with a global summary row. Formal separability analysis demonstrates that the per-section positional structure carries discriminative information that flat representations cannot capture. Under strictly controlled sample-matched comparison, a gradient-boosted classifier on the compact PRISM representation recovers nearly all of the binary-detection performance of the same classifier on the much larger EMBER vector at roughly one-sixth the dimensionality, with the two representations operationally indistinguishable at the decision threshold.

What carries the argument

The PRISM matrix: a 2D representation with rows as PE sections in file order plus a summary row that preserves compatibility with existing flat-vector models.

If this is right

The binary detection task is saturated, leaving PRISM's structural content for tasks with greater headroom such as family classification.
Architectures that operate directly on the 2D matrix structure become feasible without losing the performance already obtained.
The released corpus of 83,633 matrices and 49,204 family-filtered samples supports further experiments under open licences.
EMBER retains only a small, consistent advantage confined to the extreme low-false-positive regime.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same section-ordering principle could be applied to other executable formats that also contain ordered segments.
Models that process the matrix with 2D operations might extract additional signal beyond what gradient boosting achieves.
The inter-section information-gain metric offers a general way to quantify positional value in any ordered file format.

Load-bearing premise

The per-section ordering and relational context supply discriminative signals that a flat collection of the same features cannot recover.

What would settle it

A controlled experiment in which a flat feature vector of size comparable to PRISM achieves equal or higher detection accuracy than PRISM on the identical sample set.

Figures

Figures reproduced from arXiv: 2606.27109 by Ana I. Gonz\'alez-Tablas, Jos\'e M. Sacrist\'an.

**Figure 3.** Figure 3: Top 15 malware families in BODMAS by sample [PITH_FULL_IMAGE:figures/full_fig_p006_3.png] view at source ↗

**Figure 4.** Figure 4: Top 20 malware families in the family-filtered corpus [PITH_FULL_IMAGE:figures/full_fig_p007_4.png] view at source ↗

**Figure 6.** Figure 6: FDR (left) and MI (right) heatmaps over the [PITH_FULL_IMAGE:figures/full_fig_p008_6.png] view at source ↗

**Figure 5.** Figure 5: Per-feature FDR comparison on the 49,204-sample [PITH_FULL_IMAGE:figures/full_fig_p008_5.png] view at source ↗

**Figure 7.** Figure 7: Top 20 inter-section feature pairs by ∆I on the 49,204-sample family-filtered corpus. Each bar represents a pair of cells (sectiona, featurea)×(sectionb, featureb) whose joint MI exceeds the best individual MI by the indicated margin. The top pair (raw_size@SEC2, name5@SEC3) contributes ∆I = 0.205 bits. The systematic presence of ∆I > 0.01 in 15.1% of all inter-section (section, feature) cell pairs constit… view at source ↗

**Figure 8.** Figure 8: Entropy profile by PE section position on the 49,204- [PITH_FULL_IMAGE:figures/full_fig_p010_8.png] view at source ↗

read the original abstract

We introduce PRISM (PE Relational Inter-Section Matrix), an open dataset and feature representation for static Windows PE malware detection. Existing benchmarks such as EMBER, BODMAS, and SOREL-20M represent each PE file as a flat one-dimensional feature vector, discarding the ordering of sections and the relational context between them. PRISM instead encodes every binary as a two-dimensional matrix whose rows are individual PE sections in file order, with a global summary row that preserves compatibility with EMBER-style models. We build the corpus from four malware sources (BODMAS, MalwareBazaar, VirusShare, and CAPE) together with SOREL-20M benign software, yielding 83,633 deduplicated matrices and a family-filtered analysis corpus of 49,204 samples across 684 malware families. A formal separability analysis (Fisher Discriminant Ratio, mutual information, and inter-section information gain) shows that the per-section positional structure carries discriminative information that flat representations cannot capture. Under a strictly controlled, sample-matched comparison, a gradient-boosted classifier on the compact PRISM representation recovers nearly all of the binary-detection performance of the same classifier on the much larger EMBER vector, at roughly one-sixth the dimensionality; EMBER retains only a small, consistent advantage confined to the extreme low-false-positive regime, the two being operationally indistinguishable at the decision threshold. We are explicit that this binary task is saturated, so the structural content PRISM preserves is reserved for tasks with greater metric headroom, such as family classification and architectures that exploit the 2D structure directly. The dataset, extraction library, trained models, and full analysis pipeline are released under CC BY-NC-SA and MIT licences.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

PRISM gives a compact 2D section-ordered matrix for PE files that recovers nearly all of EMBER's binary detection performance at one-sixth the size, plus an openly released corpus.

read the letter

The new piece is the 2D matrix that keeps PE sections in file order plus a global summary row, built from BODMAS, MalwareBazaar, VirusShare, CAPE, and SOREL-20M sources into 83k deduplicated samples. They run a sample-matched comparison with the same gradient-boosted classifier and show the PRISM version stays within a small, consistent gap of the much larger EMBER vector, with the gap only mattering at very low false-positive rates. The separability numbers (Fisher ratio, mutual information, inter-section gain) back the claim that order and section relations add signal flat vectors miss.

Releasing the matrices, extraction library, models, and pipeline under CC BY-NC-SA and MIT is the practical win here; it lets others test the structure on family classification or 2D-aware models without rebuilding everything.

The main limitation is that binary detection is already saturated, so the real test is whether the preserved structure helps on harder tasks. The abstract flags this and points to family work, but the strength of that claim depends on how the 49k family-filtered set was built and whether selection effects from the four sources show up in the results. Minor data-construction details like exact deduplication thresholds would need checking in the full text.

This is for people working on static PE features who want something between flat vectors and raw bytes. A reader building detectors or testing structured representations would get concrete value from the dataset alone. It has enough new representation work and controlled evidence to deserve referee time rather than a desk reject.

Referee Report

2 major / 3 minor

Summary. The paper introduces PRISM, a 2D section-aware matrix representation for static PE malware detection that preserves the order and relational context of PE sections, unlike flat vectors in benchmarks like EMBER. It assembles a new corpus of 83,633 deduplicated samples from BODMAS, MalwareBazaar, VirusShare, CAPE, and SOREL-20M, with a family-filtered subset of 49,204 samples across 684 families. Through separability analysis using Fisher Discriminant Ratio, mutual information, and inter-section information gain, it shows that positional structure provides discriminative information. In controlled sample-matched experiments, a gradient-boosted classifier on the compact PRISM features achieves nearly equivalent binary malware detection performance to the same classifier on the larger EMBER vector at about one-sixth the dimensionality, with the representations being operationally similar at standard thresholds. The work emphasizes that the binary detection task is saturated and positions PRISM for more challenging tasks like family classification, releasing the dataset, library, models, and pipeline openly.

Significance. If the controlled comparison and separability results hold, this work is significant for demonstrating that a compact, structured 2D representation can retain nearly all binary-detection utility of much larger flat vectors while explicitly preserving positional and relational section information for future tasks with greater headroom (e.g., family classification). The open release of the full dataset, extraction library, trained models, and analysis pipeline under CC BY-NC-SA and MIT licenses is a clear strength that enables reproducibility and extension by the community.

major comments (2)

[Abstract] Abstract: the central performance claim is stated only qualitatively ('recovers nearly all', 'small, consistent advantage', 'operationally indistinguishable') without any numerical results such as AUC, TPR@FPR=0.001, or accuracy deltas; this makes it impossible to evaluate the strength of the 'nearly equivalent' assertion that underpins the dimensionality/performance tradeoff.
[§3] §3 (dataset construction, inferred from abstract): the description of deduplication across four malware sources plus SOREL-20M and the subsequent family-filtering step to 49,204 samples lacks any detail on the exact procedure (e.g., hash-based, fuzzy, or section-content matching) or the family-labeling criteria; without these, it is unclear whether selection effects could inflate the reported separability or classification parity.

minor comments (3)

[Abstract] The abstract introduces the 'global summary row' for EMBER compatibility but does not specify its construction (e.g., which statistics are aggregated or how it is concatenated); this should be clarified in the methods section.
Consider adding an explicit table (perhaps in §4) listing the exact dimensionality of the PRISM matrix versus the EMBER vector used in the matched experiment.
The separability metrics (Fisher Discriminant Ratio, mutual information, inter-section information gain) are named but their precise formulas and per-section versus global computation are not shown; a short methods subsection would improve clarity.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the detailed and constructive feedback. We address each major comment below and will incorporate revisions to improve the clarity and evaluability of the manuscript.

read point-by-point responses

Referee: [Abstract] Abstract: the central performance claim is stated only qualitatively ('recovers nearly all', 'small, consistent advantage', 'operationally indistinguishable') without any numerical results such as AUC, TPR@FPR=0.001, or accuracy deltas; this makes it impossible to evaluate the strength of the 'nearly equivalent' assertion that underpins the dimensionality/performance tradeoff.

Authors: We agree that the abstract would benefit from quantitative support for the performance claims. While the body of the manuscript reports specific metrics (including AUC, TPR at low FPR thresholds, and accuracy deltas from the controlled experiments), we will revise the abstract to include key numerical results such as the AUC values and TPR@FPR=0.001 to make the 'nearly equivalent' claim directly evaluable. revision: yes
Referee: [§3] §3 (dataset construction, inferred from abstract): the description of deduplication across four malware sources plus SOREL-20M and the subsequent family-filtering step to 49,204 samples lacks any detail on the exact procedure (e.g., hash-based, fuzzy, or section-content matching) or the family-labeling criteria; without these, it is unclear whether selection effects could inflate the reported separability or classification parity.

Authors: We acknowledge that additional procedural details are needed for full transparency. We will expand the dataset construction section to specify the deduplication method (SHA-256 hash-based exact matching across sources) and the family-labeling criteria (consensus labeling via AVClass on multi-engine AV reports). These additions will clarify the process and allow readers to assess potential selection effects. revision: yes

Circularity Check

0 steps flagged

No significant circularity identified

full rationale

The paper constructs PRISM as a new 2D matrix representation from independently sourced corpora (BODMAS, MalwareBazaar, VirusShare, CAPE, SOREL-20M) and evaluates it via direct empirical comparison to the external EMBER benchmark using standard gradient-boosted classifiers and separability metrics (Fisher Discriminant Ratio, mutual information). No equations, parameters, or claims reduce by construction to quantities defined inside the paper; the performance and separability results are computed from the assembled data and remain falsifiable against public external references. The argument chain is therefore self-contained and does not exhibit any of the enumerated circularity patterns.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

The work introduces a new dataset and feature representation without specifying any free parameters fitted to data, mathematical axioms beyond standard practices, or invented entities. It relies on existing malware sources and standard ML classifiers.

pith-pipeline@v0.9.1-grok · 5864 in / 1306 out tokens · 57355 ms · 2026-06-26T03:29:33.303497+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

23 extracted references

[1]

EMBER: An Open Dataset for Training Static PE Malware Machine Learning Models,

H. S. Anderson and P. Roth, “EMBER: An Open Dataset for Training Static PE Malware Machine Learning Models,” Apr. 2018

2018
[2]

BODMAS: An Open Dataset for Learning-Based Temporal Analysis of PE Malware,

L. Yang, A. Ciptadi, I. Laziuk, A. Ahmadzadeh, and G. Wang, “BODMAS: An Open Dataset for Learning-Based Temporal Analysis of PE Malware,” in2021 IEEE Security and Privacy Workshops (SPW). San Francisco, CA, USA: IEEE, May 2021, pp. 78–84

2021
[3]

LightGBM: A Highly Efficient Gradient Boosting Decision Tree,

G. Ke, Q. Meng, T. Finley, T. Wang, W. Chen, W. Ma, Q. Ye, and T.-Y. Liu, “LightGBM: A Highly Efficient Gradient Boosting Decision Tree,” inAdvances in Neural Information Processing Systems 30 (NeurIPS 2017). Long Beach, CA, USA: Curran Associates, Inc., 2017, pp. 3146–3154

2017
[4]

EMBER2024 — A Bench- mark Dataset for Holistic Evaluation of Malware Classifiers,

R. J. Joyce, G. Miller, P. Roth, R. Zak, E. Zaresky-Williams, H. Anderson, E. Raff, and J. Holt, “EMBER2024 — A Bench- mark Dataset for Holistic Evaluation of Malware Classifiers,” in Proceedings of the 31st ACM SIGKDD Conference on Knowledge Discovery and Data Mining V.2. Toronto, ON, Canada: ACM, Aug. 2025, pp. 5516–5526

2025
[5]

Real-time malware prevention using gradient boosted decision trees on the EMBER 2024 dataset: A static analysis approach for Windows PE binaries,

S.S.Abdulwahab,M.Z.Abdullah,andA.H.Sallomi,“Real-time malware prevention using gradient boosted decision trees on the EMBER 2024 dataset: A static analysis approach for Windows PE binaries,”International Journal of Intelligent Engineering and Systems, vol. 19, no. 6, pp. 748–762, 2026

2024
[6]

SOREL-20M: A Large Scale Benchmark Dataset for Malicious PE Detection,

R. Harang and E. M. Rudd, “SOREL-20M: A Large Scale Benchmark Dataset for Malicious PE Detection,” Dec. 2020

2020
[7]

Multi-feature Dataset for Windows PE Malware Classification,

M. I. Yousuf, I. Anwer, T. Shakir, M. Siddiqui, and M. Shahid, “Multi-feature Dataset for Windows PE Malware Classification,” Oct. 2022

2022
[8]

Measurement of Malware Family Classification on a Large-Scale Real-World Dataset,

Q. Wang, H. Yan, C. Zhao, R. Mei, Z. Han, and Y. Zhou, “Measurement of Malware Family Classification on a Large-Scale Real-World Dataset,” in2022 IEEE International Conference on Trust, Security and Privacy in Computing and Communications (TrustCom). Wuhan, China: IEEE, Dec. 2022, pp. 1390–1397

2022
[9]

A PE header-based method for malware detection using clustering and deep em- bedding techniques,

T. Rezaei, F. Manavi, and A. Hamzeh, “A PE header-based method for malware detection using clustering and deep em- bedding techniques,”Journal of Information Security and Applications, vol. 60, p. 102876, Aug. 2021

2021
[10]

Deep Learning- Based Malware Detection Using PE Headers,

A. Nakrošis, I. Lagzdinyt˙ e-Budnik˙ e, A. Paulauskait˙ e- Tarasevičien˙ e, G. Paulikas, and P. Dapkus, “Deep Learning- Based Malware Detection Using PE Headers,” inInformation and Software Technologies (ICIST 2022), ser. Communications in Computer and Information Science, A. Lopata, D. Gudonien˙ e, and R. Butkien˙ e, Eds. Cham: Springer International Pub...

2022
[11]

An Improved Method for Packed Malware Detection using PE Header and Section Table Information,

N. Maleki, M. Bateni, and H. Rastegari, “An Improved Method for Packed Malware Detection using PE Header and Section Table Information,”International Journal of Computer Network and Information Security, vol. 11, no. 9, pp. 9–17, Sep. 2019

2019
[12]

Static Analysis and Machine Learning- Based Malware Detection System using PE Header Feature Values,

C. K. Yuk and C. J. Seo, “Static Analysis and Machine Learning- Based Malware Detection System using PE Header Feature Values,”International Journal of Innovative Research and Scientific Studies, vol. 5, no. 4, pp. 281–288, Oct. 2022

2022
[13]

Windows malware detection based on static analysis with multiple features,

M. I. Yousuf, I. Anwer, A. Riasat, K. T. Zia, and S. Kim, “Windows malware detection based on static analysis with multiple features,”PeerJ Computer Science, vol. 9, p. e1319, Apr. 2023

2023
[14]

Recent Advancements in Machine Learning Models for Malware Detection: A Systematic Literature Review,

N. I. Hasanah, G. P. Insany, I. L. Kharisma, and N. D. Rahayu, “Recent Advancements in Machine Learning Models for Malware Detection: A Systematic Literature Review,” inThe 7th Interna- tional Global Conference Series on ICT Integration in Technical Education & Smart Society. MDPI, Sep. 2025, p. 78

2025
[15]

Image Representation Based Malware Detection Using Transfer Learning,

I. M. Malik Matin, I. Hermawan, S. D. Yulianti, I. A. Ahmad, Naurah, and Z. Azizah, “Image Representation Based Malware Detection Using Transfer Learning,” in2025 IEEE Conference on Cloud and Big Data Computing (CBDCom). Hakodate, Japan: IEEE, Oct. 2025, pp. 136–142

2025
[16]

A Proposed New Endpoint Detection and Response With Image-Based Malware Detection System,

T. H. Hai, V. Van Thieu, T. T. Duong, H. H. Nguyen, and E.-N. Huh, “A Proposed New Endpoint Detection and Response With Image-Based Malware Detection System,”IEEE Access, vol. 11, pp. 122859–122875, 2023

2023
[17]

Semantic lossless encoded image representation for malware classification,

Y. Yu, B. Cai, K. Aziz, X. Wang, J. Luo, M. S. Iqbal, P. Chakrabarti, and T. Chakrabarti, “Semantic lossless encoded image representation for malware classification,”Scientific Re- ports, vol. 15, no. 1, p. 7997, Mar. 2025

2025
[18]

MCPDS: Image-based malware classification method using PE metadata alone,

Y. Zhao, C. Guo, Y. Ping, Y. Chen, Y. Cui, and G. Shen, “MCPDS: Image-based malware classification method using PE metadata alone,”Cybersecurity, vol. 9, no. 1, p. 34, Feb. 2026

2026
[19]

Hybrid Malware Classification using Static and Dynamic Features with Machine Learning,

M. I. El-Hajj, “Hybrid Malware Classification using Static and Dynamic Features with Machine Learning,” in2025 12th International Conference on Wireless Networks and Mobile Communications (WINCOM). Riyadh, Saudi Arabia: IEEE, Nov. 2025, pp. 1–8

2025
[20]

Estimating mutual information,

A. Kraskov, H. Stögbauer, and P. Grassberger, “Estimating mutual information,”Physical Review E, vol. 69, no. 6, p. 066138, Jun. 2004

2004
[21]

LIEF — Library to Instrument Executable Formats,

R. Thomas, “LIEF — Library to Instrument Executable Formats,” https://github.com/lief-project/LIEF, 2017, version 0.14.1

2017
[22]

MalwareBazaar — A Project from abuse.ch,

abuse.ch, “MalwareBazaar — A Project from abuse.ch,” https: //bazaar.abuse.ch/, 2020, accessed: May 2025

2020
[23]

VirusShare.com,

J.-M. Godwin, “VirusShare.com,” https://virusshare.com/, 2012, accessed: May 2025

2012

[1] [1]

EMBER: An Open Dataset for Training Static PE Malware Machine Learning Models,

H. S. Anderson and P. Roth, “EMBER: An Open Dataset for Training Static PE Malware Machine Learning Models,” Apr. 2018

2018

[2] [2]

BODMAS: An Open Dataset for Learning-Based Temporal Analysis of PE Malware,

L. Yang, A. Ciptadi, I. Laziuk, A. Ahmadzadeh, and G. Wang, “BODMAS: An Open Dataset for Learning-Based Temporal Analysis of PE Malware,” in2021 IEEE Security and Privacy Workshops (SPW). San Francisco, CA, USA: IEEE, May 2021, pp. 78–84

2021

[3] [3]

LightGBM: A Highly Efficient Gradient Boosting Decision Tree,

G. Ke, Q. Meng, T. Finley, T. Wang, W. Chen, W. Ma, Q. Ye, and T.-Y. Liu, “LightGBM: A Highly Efficient Gradient Boosting Decision Tree,” inAdvances in Neural Information Processing Systems 30 (NeurIPS 2017). Long Beach, CA, USA: Curran Associates, Inc., 2017, pp. 3146–3154

2017

[4] [4]

EMBER2024 — A Bench- mark Dataset for Holistic Evaluation of Malware Classifiers,

R. J. Joyce, G. Miller, P. Roth, R. Zak, E. Zaresky-Williams, H. Anderson, E. Raff, and J. Holt, “EMBER2024 — A Bench- mark Dataset for Holistic Evaluation of Malware Classifiers,” in Proceedings of the 31st ACM SIGKDD Conference on Knowledge Discovery and Data Mining V.2. Toronto, ON, Canada: ACM, Aug. 2025, pp. 5516–5526

2025

[5] [5]

Real-time malware prevention using gradient boosted decision trees on the EMBER 2024 dataset: A static analysis approach for Windows PE binaries,

S.S.Abdulwahab,M.Z.Abdullah,andA.H.Sallomi,“Real-time malware prevention using gradient boosted decision trees on the EMBER 2024 dataset: A static analysis approach for Windows PE binaries,”International Journal of Intelligent Engineering and Systems, vol. 19, no. 6, pp. 748–762, 2026

2024

[6] [6]

SOREL-20M: A Large Scale Benchmark Dataset for Malicious PE Detection,

R. Harang and E. M. Rudd, “SOREL-20M: A Large Scale Benchmark Dataset for Malicious PE Detection,” Dec. 2020

2020

[7] [7]

Multi-feature Dataset for Windows PE Malware Classification,

M. I. Yousuf, I. Anwer, T. Shakir, M. Siddiqui, and M. Shahid, “Multi-feature Dataset for Windows PE Malware Classification,” Oct. 2022

2022

[8] [8]

Measurement of Malware Family Classification on a Large-Scale Real-World Dataset,

Q. Wang, H. Yan, C. Zhao, R. Mei, Z. Han, and Y. Zhou, “Measurement of Malware Family Classification on a Large-Scale Real-World Dataset,” in2022 IEEE International Conference on Trust, Security and Privacy in Computing and Communications (TrustCom). Wuhan, China: IEEE, Dec. 2022, pp. 1390–1397

2022

[9] [9]

A PE header-based method for malware detection using clustering and deep em- bedding techniques,

T. Rezaei, F. Manavi, and A. Hamzeh, “A PE header-based method for malware detection using clustering and deep em- bedding techniques,”Journal of Information Security and Applications, vol. 60, p. 102876, Aug. 2021

2021

[10] [10]

Deep Learning- Based Malware Detection Using PE Headers,

A. Nakrošis, I. Lagzdinyt˙ e-Budnik˙ e, A. Paulauskait˙ e- Tarasevičien˙ e, G. Paulikas, and P. Dapkus, “Deep Learning- Based Malware Detection Using PE Headers,” inInformation and Software Technologies (ICIST 2022), ser. Communications in Computer and Information Science, A. Lopata, D. Gudonien˙ e, and R. Butkien˙ e, Eds. Cham: Springer International Pub...

2022

[11] [11]

An Improved Method for Packed Malware Detection using PE Header and Section Table Information,

N. Maleki, M. Bateni, and H. Rastegari, “An Improved Method for Packed Malware Detection using PE Header and Section Table Information,”International Journal of Computer Network and Information Security, vol. 11, no. 9, pp. 9–17, Sep. 2019

2019

[12] [12]

Static Analysis and Machine Learning- Based Malware Detection System using PE Header Feature Values,

C. K. Yuk and C. J. Seo, “Static Analysis and Machine Learning- Based Malware Detection System using PE Header Feature Values,”International Journal of Innovative Research and Scientific Studies, vol. 5, no. 4, pp. 281–288, Oct. 2022

2022

[13] [13]

Windows malware detection based on static analysis with multiple features,

M. I. Yousuf, I. Anwer, A. Riasat, K. T. Zia, and S. Kim, “Windows malware detection based on static analysis with multiple features,”PeerJ Computer Science, vol. 9, p. e1319, Apr. 2023

2023

[14] [14]

Recent Advancements in Machine Learning Models for Malware Detection: A Systematic Literature Review,

N. I. Hasanah, G. P. Insany, I. L. Kharisma, and N. D. Rahayu, “Recent Advancements in Machine Learning Models for Malware Detection: A Systematic Literature Review,” inThe 7th Interna- tional Global Conference Series on ICT Integration in Technical Education & Smart Society. MDPI, Sep. 2025, p. 78

2025

[15] [15]

Image Representation Based Malware Detection Using Transfer Learning,

I. M. Malik Matin, I. Hermawan, S. D. Yulianti, I. A. Ahmad, Naurah, and Z. Azizah, “Image Representation Based Malware Detection Using Transfer Learning,” in2025 IEEE Conference on Cloud and Big Data Computing (CBDCom). Hakodate, Japan: IEEE, Oct. 2025, pp. 136–142

2025

[16] [16]

A Proposed New Endpoint Detection and Response With Image-Based Malware Detection System,

T. H. Hai, V. Van Thieu, T. T. Duong, H. H. Nguyen, and E.-N. Huh, “A Proposed New Endpoint Detection and Response With Image-Based Malware Detection System,”IEEE Access, vol. 11, pp. 122859–122875, 2023

2023

[17] [17]

Semantic lossless encoded image representation for malware classification,

Y. Yu, B. Cai, K. Aziz, X. Wang, J. Luo, M. S. Iqbal, P. Chakrabarti, and T. Chakrabarti, “Semantic lossless encoded image representation for malware classification,”Scientific Re- ports, vol. 15, no. 1, p. 7997, Mar. 2025

2025

[18] [18]

MCPDS: Image-based malware classification method using PE metadata alone,

Y. Zhao, C. Guo, Y. Ping, Y. Chen, Y. Cui, and G. Shen, “MCPDS: Image-based malware classification method using PE metadata alone,”Cybersecurity, vol. 9, no. 1, p. 34, Feb. 2026

2026

[19] [19]

Hybrid Malware Classification using Static and Dynamic Features with Machine Learning,

M. I. El-Hajj, “Hybrid Malware Classification using Static and Dynamic Features with Machine Learning,” in2025 12th International Conference on Wireless Networks and Mobile Communications (WINCOM). Riyadh, Saudi Arabia: IEEE, Nov. 2025, pp. 1–8

2025

[20] [20]

Estimating mutual information,

A. Kraskov, H. Stögbauer, and P. Grassberger, “Estimating mutual information,”Physical Review E, vol. 69, no. 6, p. 066138, Jun. 2004

2004

[21] [21]

LIEF — Library to Instrument Executable Formats,

R. Thomas, “LIEF — Library to Instrument Executable Formats,” https://github.com/lief-project/LIEF, 2017, version 0.14.1

2017

[22] [22]

MalwareBazaar — A Project from abuse.ch,

abuse.ch, “MalwareBazaar — A Project from abuse.ch,” https: //bazaar.abuse.ch/, 2020, accessed: May 2025

2020

[23] [23]

VirusShare.com,

J.-M. Godwin, “VirusShare.com,” https://virusshare.com/, 2012, accessed: May 2025

2012