The Role of Domain-Specific Features in Malware Detection: A macOS Case Study
Pith reviewed 2026-06-28 09:46 UTC · model grok-4.3
The pith
Domain-specific macOS features enable a malware detector to reach 98.5 percent accuracy and maintain performance on new samples.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The paper establishes that static domain-specific features extracted from macOS Mach-O binaries, including embedded certificates, entitlements, persistence techniques and key system APIs, enable a machine learning detector to achieve 98.50 percent detection performance on a dataset of 41,129 samples and 99.50 percent on a temporal holdout of 9,000 fresh executables, outperforming prior approaches by 16 percent and 50 percent respectively, with the domain-specific features proving essential as their removal causes a 15.92 percent drop in detection on novel samples.
What carries the argument
The set of static domain-specific features for macOS binaries, such as embedded certificates, entitlements, persistence techniques, and key system APIs, which capture operating-system-unique traits for classifying unknown samples.
If this is right
- Incorporating macOS-specific static traits produces higher detection rates than generic binary features alone.
- The same features support stronger generalization when the model encounters malware samples collected after the training period.
- Prior detectors that omit these platform traits will continue to show lower performance on macOS executables.
- Releasing the labeled dataset allows other researchers to test and extend models that use these features.
Where Pith is reading between the lines
- Comparable platform-native feature sets may improve detection accuracy on other operating systems whose binary formats have received less study.
- Tracking which macOS-specific attributes contribute most to decisions could help design lightweight rules that target common persistence methods.
- If malware authors begin altering or stripping the highlighted domain-specific attributes, periodic retraining on updated feature definitions would be needed to preserve performance.
Load-bearing premise
The 41,129-sample training set and the separate 9,000-sample temporal test set are drawn from distributions that remain representative of real-world macOS executables over time and are not biased by collection method or labeling errors.
What would settle it
An independent collection of recent macOS binaries on which the detector without the domain-specific features matches or exceeds the detection rate of the full feature set.
Figures
read the original abstract
Despite the growing popularity of macOS among end users and enterprise systems, malware research has primarily focused on Windows and Android operating systems, leaving the problem of macOS malware detection relatively unexplored. Indeed, the specificity of the operating system and the unique characteristics of the Mach-O file format can play a fundamental role in the classification of unknown samples, drastically increasing the detection rate. In this work, for the first time in the literature, we employ new domain-specific features, i.e., static features specific to macOS binaries, such as embedded certificates, entitlements, persistence techniques and key system APIs, to train a machine learning malware detector. We perform a comprehensive experimental evaluation on a novel dataset of 41,129 samples, comprising 11,413 benign and 29,716 malicious executables, and demonstrate that our solution achieves state-of-the-art detection performance (98.50%), outperforming all existing approaches, with an average improvement of 16% in terms of detection rate. We also provide an in-depth analysis of the importance of the individual features, showing that our detector effectively leverages the new domain-specific features. Then, in order to evaluate the generalization capabilities of our detector over time, we perform a real-world evaluation on a new dataset of 9,000 fresh macOS executables. The results show that (i) our detector maintains a very high detection rate (99.50%), (ii) outperforms the state-of-the-art by 50%, and (iii) the domain-specific features are crucial for generalizing to novel malware samples, as their removal leads to a 15.92% drop in detection performance. Finally, we also release our dataset to the research community.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript introduces domain-specific static features for macOS malware detection based on the Mach-O format (embedded certificates, entitlements, persistence techniques, and key system APIs). It reports training ML models on a new dataset of 41,129 samples (11,413 benign, 29,716 malicious) to achieve 98.50% detection performance, outperforming prior work by an average of 16%. A temporal hold-out evaluation on 9,000 fresh samples yields 99.50% detection, with an ablation study indicating that removing the domain-specific features causes a 15.92% drop; the dataset is released publicly.
Significance. If the performance numbers and ablation results hold under reliable labeling, the work advances the under-explored macOS malware detection literature by demonstrating the value of OS-specific features and providing the first large public macOS malware dataset with temporal validation. Releasing the dataset is a clear strength that supports reproducibility and follow-on research.
major comments (2)
- [Experimental Evaluation] Dataset Collection and Labeling (Experimental Evaluation): The samples are stated to have been collected from public repositories and labeled via AV engines, yet no quantitative audit is provided (e.g., number of scanners, agreement threshold for positive labels, inter-scanner consistency, or manual review of a held-out subset). Because the headline claims (98.50% accuracy, 99.50% temporal detection, and the 15.92% ablation delta) rest directly on label correctness, the absence of such validation leaves open the possibility that label noise inflates both absolute performance and the reported importance of the domain-specific features.
- [Methods / Feature Engineering] Feature Extraction Procedure (Methods / Feature Engineering): The extraction logic for the new domain-specific features (e.g., how entitlements are parsed or persistence techniques identified) is described at a high level without pseudocode, implementation details, or reference to the released dataset schema. This prevents independent verification that the ablation study isolates the contribution of these features rather than implementation artifacts, directly affecting the central claim that the features are “crucial for generalizing to novel malware samples.”
minor comments (1)
- [Abstract] The abstract states that the approach “outperforms all existing approaches” without naming the baselines or citing them; moving the comparison details into the abstract or adding a short table reference would improve clarity.
Simulated Author's Rebuttal
We thank the referee for the constructive comments, which highlight important aspects of reproducibility and validation. We address each major comment below and commit to revisions that strengthen the manuscript without altering its core claims.
read point-by-point responses
-
Referee: The samples are stated to have been collected from public repositories and labeled via AV engines, yet no quantitative audit is provided (e.g., number of scanners, agreement threshold for positive labels, inter-scanner consistency, or manual review of a held-out subset). Because the headline claims (98.50% accuracy, 99.50% temporal detection, and the 15.92% ablation delta) rest directly on label correctness, the absence of such validation leaves open the possibility that label noise inflates both absolute performance and the reported importance of the domain-specific features.
Authors: We agree that explicit documentation of the labeling process is essential given the reliance on AV-based labels. The original manuscript described collection from public repositories and AV labeling at a high level. In revision we will add a dedicated subsection detailing the labeling procedure, including the specific AV engines consulted, the minimum agreement threshold applied for malicious labels, inter-scanner consistency metrics where available, and an explicit discussion of the limitations of AV labeling. This will allow readers to assess potential label noise directly. revision: yes
-
Referee: The extraction logic for the new domain-specific features (e.g., how entitlements are parsed or persistence techniques identified) is described at a high level without pseudocode, implementation details, or reference to the released dataset schema. This prevents independent verification that the ablation study isolates the contribution of these features rather than implementation artifacts, directly affecting the central claim that the features are “crucial for generalizing to novel malware samples.”
Authors: We concur that greater implementation transparency is required to support independent verification of the ablation results. In the revised manuscript we will include pseudocode for the core extraction routines (certificate parsing, entitlement extraction, persistence technique identification, and API usage), provide additional implementation notes, and explicitly reference the schema and documentation of the publicly released dataset so that the ablation study can be reproduced from the released artifacts. revision: yes
Circularity Check
No circularity: standard empirical ML pipeline on independent datasets
full rationale
The paper describes collection of a 41k-sample training set and a separate 9k temporal test set of macOS binaries, extraction of static features (including new domain-specific ones), training of an ML classifier, and reporting of accuracy/F1 on the held-out sets plus an ablation study. No equations, predictions, or uniqueness claims are present that reduce reported performance metrics to a parameter fitted from the same data by construction. The temporal test set is described as fresh and independent. Feature-importance analysis is post-hoc and does not feed back into the performance numbers. This is a conventional supervised-learning evaluation whose central claims rest on external data rather than self-referential definitions or self-citation chains.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption The selected domain-specific features (certificates, entitlements, persistence techniques, key system APIs) are stable indicators that distinguish malicious from benign macOS binaries.
Reference graph
Works this paper leans on
-
[1]
Hyrum S. Anderson and Phil Roth. 2018. EMBER: An Open Dataset for Train- ing Static PE Malware Machine Learning Models.ArXiv e-prints(April 2018). arXiv:1804.04637
Pith/arXiv arXiv 2018
-
[2]
Apple. 2021. App code signing process in macOS. https://support.apple.com/en- gb/guide/security/sec3ad8e6e53/web Accessed: June 3, 2026
2021
-
[3]
Apple. 2025. App Sandbox. https://developer.apple.com/documentation/securi ty/app-sandbox Accessed: June 3, 2026
2025
-
[4]
Apple. 2025. Apple Documentation. https://developer.apple.com/documentation/ Accessed: June 3, 2026
2025
-
[5]
Apple. 2025. Apple Platform Security. https://support.apple.com/en-us/102149 Accessed: June 3, 2026
2025
-
[6]
Apple. 2025. Bundle Programming Guide. https://developer.apple.com/library/ archive/documentation/CoreFoundation/Conceptual/CFBundles/BundleTypes/ BundleTypes.html#//apple_ref/doc/uid/10000123i-CH101-SW1 Accessed: June 3, 2026
arXiv 2025
-
[7]
Apple. 2025. Entitlements. https://developer.apple.com/documentation/bundle resources/entitlements Accessed: June 3, 2026
2025
-
[8]
Apple. 2025. Gatekeeper and runtime protection in macOS. https://support.ap ple.com/en-gb/guide/security/sec5599b66df/web Accessed: June 3, 2026
2025
-
[9]
Apple. 2025. Hardened Runtime. https://developer.apple.com/documentation/se curity/hardened-runtime Accessed: June 3, 2026
2025
-
[10]
Daniel Arp, Michael Spreitzenbarth, Malte Hubner, Hugo Gascon, and Konrad Rieck. 2014. Drebin: Effective and explainable detection of android malware in your pocket.. InNDSS, Vol. 14. The Internet Society, 23–26
2014
-
[11]
Christopher M. Bishop. 2006.Pattern Recognition and Machine Learning (Infor- mation Science and Statistics). Springer-Verlag, Berlin, Heidelberg
2006
-
[12]
Henrik Boström. 2022. crepes: a Python Package for Generating Conformal Regressors and Predictive Systems. InProceedings of the Eleventh Symposium on Conformal and Probabilistic Prediction and Applications (Proceedings of Machine Learning Research, Vol. 179), Ulf Johansson, Henrik Boström, Khuong An Nguyen, Zhiyuan Luo, and Lars Carlsson (Eds.). PMLR
2022
-
[13]
Leo Breiman. 2001. Random forests.Machine learning45 (2001), 5–32
2001
-
[14]
Jason Brownlee. 2024. XGBoost Best Feature Importance Score. https://xgboosti ng.com/xgboost-best-feature-importance-score/
2024
-
[15]
Ero Carrera. 2025. Multi-platform Python module to parse and work with Portable Executable (PE) files. https://github.com/erocarrera/pefile Accessed: June 3, 2026
2025
-
[16]
Nitesh V Chawla, Kevin W Bowyer, Lawrence O Hall, and W Philip Kegelmeyer
-
[17]
SMOTE: synthetic minority over-sampling technique.Journal of artificial intelligence research16 (2002), 321–357
2002
-
[18]
2022.Machine Learning for OSX Malware Detection
Alex Chenxingyu Chen and Kenneth Wulff. 2022.Machine Learning for OSX Malware Detection. Springer International Publishing, Cham, 209–222. https: //doi.org/10.1007/978-3-030-74753-4_14
-
[19]
Tianqi Chen and Carlos Guestrin. 2016. XGBoost: A Scalable Tree Boosting System. InProceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining(San Francisco, California, USA)(KDD ’16). Association for Computing Machinery, New York, NY, USA, 785–794. https: //doi.org/10.1145/2939672.2939785
-
[20]
Emanuele Cozzi, Mariano Graziano, Yanick Fratantonio, and Davide Balzarotti
-
[21]
In2018 IEEE symposium on security and privacy (SP)
Understanding linux malware. In2018 IEEE symposium on security and privacy (SP). IEEE, 161–175
-
[22]
Savino Dambra, Yufei Han, Simone Aonzo, Platon Kotzias, Antonino Vitale, Juan Caballero, Davide Balzarotti, and Leyla Bilge. 2023. Decoding the Secrets of Machine Learning in Malware Classification: A Deep Dive into Datasets, Feature Extraction, and Model Performance. InProceedings of the 2023 ACM SIGSAC Conference on Computer and Communications Security ...
work page doi:10.1145/35 2023
-
[23]
Luca Demetrio, Battista Biggio, Giovanni Lagorio, Fabio Roli, and Alessandro Armando. 2021. Functionality-Preserving Black-Box Optimization of Adversarial Windows Malware.IEEE Transactions on Information Forensics and Security16 (2021), 3469–3478. doi:10.1109/TIFS.2021.3082330
-
[24]
Jonny Evans. 2023. Three-quarters of large US firms now using more Apple devices – survey. https://www.computerworld.com/article/1634358/three- quarters-of-large-us-firms-now-using-more-apple-devices-survey.html
arXiv 2023
-
[25]
Tom Fakterman. 2023. Through the Cortex XDR Lens: macOS Pirrit Adware. https://www.paloaltonetworks.com/blog/security-operations/through-the- cortex-xdr-lens-macos-pirrit-adware/ Accessed: June 3, 2026
2023
-
[26]
Samira Eisaloo Gharghasheh and Shahrzad Hadayeghparast. 2022.Mac OS X Malware Detection with Supervised Machine Learning Algorithms. Springer International Publishing, Cham, 193–208. https://doi.org/10.1007/978-3-030- 74753-4_13
-
[27]
Daniel Gibert. 2025.Machine Learning for Windows Malware Detection and Classification: Methods, Challenges, and Ongoing Research. Springer Nature Switzerland, 143–173. doi:10.1007/978-3-031-66245-4_6
-
[28]
Leo Grinsztajn, Edouard Oyallon, and Gael Varoquaux. 2022. Why do tree-based models still outperform deep learning on typical tabular data?. InAdvances in Neural Information Processing Systems, Vol. 35. Curran Associates, Inc., 507–520
2022
-
[29]
Joyce, Edward Raff, Charles Nicholas, and James Holt
Robert J. Joyce, Edward Raff, Charles Nicholas, and James Holt. 2023. MalDICT: Benchmark Datasets on Malware Behaviors, Platforms, Exploitation, and Packers. arXiv:2310.11706 [cs.CR]
arXiv 2023
-
[30]
Kaspersky Team. 2023. Are Macs safe? Threats to macOS users. https://www. kaspersky.com/blog/macos-users-cyberthreats-2023/50018/
2023
-
[31]
Doowon Kim, Bum Jun Kwon, and Tudor Dumitraş. 2017. Certified Malware: Measuring Breaches of Trust in the Windows Code-Signing PKI. InProceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security (CCS ’17). Association for Computing Machinery, New York, NY, USA, 1435–1448. doi:10.1145/3133956.3133958
-
[32]
Bojan Kolosnjaji, Apostolis Zarras, George Webster, and Claudia Eckert. 2016. Deep Learning for Classification of Malware System Call Sequences. InAI 2016: Advances in Artificial Intelligence, Byeong Ho Kang and Quan Bai (Eds.). Springer International Publishing, 137–149
2016
-
[33]
Matous Kozak, Luca Demetrio, Dmitrijs Trizna, and Fabio Roli. 2024. Updating Windows Malware Detectors: Balancing Robustness and Regression against Adversarial EXEmples. https://arxiv.org/abs/2405.02646
arXiv 2024
-
[34]
Serhii Londar. 2025. Awesome macOS open source applications. https://github .com/serhii-londar/open-source-mac-os-apps Accessed: July 30, 2024
2025
-
[35]
MalwareBazaar Team. 2025. MalwareBazaar. https://bazaar.abuse.ch Accessed: July 30, 2024
2025
-
[36]
Modhuparna Manna, Andrew Case, Aisha Ali-Gombe, and Golden G. Richard
-
[37]
doi:10.1016/j.fsidi.2021.301221
Modern macOS userland runtime analysis.Forensic Science International: Digital Investigation38 (2021), 301221. doi:10.1016/j.fsidi.2021.301221
-
[38]
Howell Max. 2025. The Missing Package Manager for macOS (or Linux). https: //brew.sh/ Accessed: July 30, 2024
2025
-
[39]
Microsoft. 2025. PE Format. https://learn.microsoft.com/en-us/windows/win32 /debug/pe-format Accessed: June 3, 2026
2025
-
[40]
MITRE. 2025. Bundlore. https://attack.mitre.org/versions/v15/software/S0482/ Accessed: June 3, 2026
2025
-
[41]
2022.Interpretable Machine Learning(2 ed.)
Christoph Molnar. 2022.Interpretable Machine Learning(2 ed.). Lulu.com. https://christophm.github.io/interpretable-ml-book
2022
-
[42]
Biagio Montaruli, Andrea Oliveri, Savino Dambra, and Davide Balzarotti. 2025. The Role of Domain-Specific Features in Malware Detection: A macOS Case Study – Dataset. https://github.com/eurecom-s3/macos-malware-dataset
2025
-
[43]
Moonlock Lab Team. 2024. Moonlock’s 2024 macOS threat report. https: //moonlock.com/moonlock-2024-macos-threat-report
2024
-
[44]
Objective-See Foundation. 2025. macOS Malware Collection. https://github.c om/objective-see/Malware Accessed: July 30, 2024
2025
-
[45]
Hamed Haddad Pajouh, Ali Dehghantanha, Raouf Khayami, and Kim-Kwang Ray- mond Choo. 2018. Intelligent OS X malware threat detection with code inspection. Journal of Computer Virology and Hacking Techniques14, 3 (2018), 213–223
2018
-
[46]
Pedregosa, G
F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cour- napeau, M. Brucher, M. Perrot, and E. Duchesnay. 2011. Scikit-learn: Machine Learning in Python.Journal of Machine Learning Research12 (2011), 2825–2830
2011
-
[47]
Andrea Ponte, Dmitrijs Trizna, Luca Demetrio, Battista Biggio, Ivan Tesfai Ogbu, and Fabio Roli. 2024. SLIFER: Investigating Performance and Robustness of Malware Detection Pipelines. https://arxiv.org/abs/2405.14478
arXiv 2024
-
[48]
Anderson, Bobby Filar, and Mark McLean
Edward Raff, William Fleshman, Richard Zak, Hyrum S. Anderson, Bobby Filar, and Mark McLean. 2021. Classifying Sequences of Extreme Length with Constant Memory Applied to Malware Detection.Proceedings of the AAAI Conference on Artificial Intelligence35, 11 (May 2021), 9386–9394. doi:10.1609/aaai.v35i11.17131
-
[49]
Sebastian Raschka. 2018. Model Evaluation, Model Selection, and Algorithm Selection in Machine Learning. http://arxiv.org/abs/1811.12808
arXiv 2018
-
[50]
Raffaele Sabato, Phil Stokes, and Tom Hegel. 2025. BlueNoroff Hidden Risk | Threat Actor Targets Macs with Fake Crypto News and Novel Persistence. https://www.sentinelone.com/labs/bluenoroff-hidden-risk-threat-actor- targets-macs-with-fake-crypto-news-and-novel-persistence/ Accessed: June 3, 2026
2025
-
[51]
Dilip Sahoo and Yash Dhawan. 2022.Evaluation of Supervised and Unsupervised Machine Learning Classifiers for Mac OS Malware Detection. Springer International Publishing, Cham, 159–175. https://doi.org/10.1007/978-3-030-74753-4_11
-
[52]
Aidan Steele. 2025. OS X ABI Mach-O File Format Reference. https://github.c om/aidansteele/osx-abi-macho-file-format-reference Accessed: June 3, 2026
2025
-
[53]
Phil Stokes. 2021. Massive New AdLoad Campaign Goes Entirely Undetected By Apple’s XProtect. https://www.sentinelone.com/labs/massive-new-adload- campaign-goes-entirely-undetected-by-apples-xprotect/ Accessed: June 3, 2026
2021
-
[54]
Phil Stokes. 2025. macOS Adload: Prolific Adware Pivots Just Days After Apple’s XProtect Clampdown. https://www.sentinelone.com/blog/macos-adload- prolific-adware-pivots-just-days-after-apples-xprotect-clampdown/ Accessed: June 3, 2026
2025
-
[55]
Phil Stokes. 2025. macOS Cuckoo Stealer | Ensuring Detection and Defense as New Samples Rapidly Emerge. https://www.sentinelone.com/blog/macos-cuck Montaruli, et al. oo-stealer-ensuring-detection-and-defense-as-new-samples-rapidly-emerge/ Accessed: June 3, 2026
2025
-
[56]
Andrew Thaeler, Yagmur Yigit, Leandros Maglaras, William J Buchanan, Nagh- meh Moradpoor, and Gordon Russell. 2023. Enhancing Mac OS Malware De- tection through Machine Learning and Mach-O File Analysis. In2023 IEEE 28th International Workshop on Computer Aided Modeling and Design of Communica- tion Links and Networks (CAMAD). 170–175. doi:10.1109/CAMAD59...
-
[57]
Romain Thomas. 2017. LIEF - Library to Instrument Executable Formats. https: //lief.quarkslab.com/
2017
-
[58]
Dmitrijs Trizna, Luca Demetrio, Battista Biggio, and Fabio Roli. 2024. Nebula: Self-Attention for Dynamic Malware Analysis.IEEE Transactions on Information Forensics and Security19 (2024), 6155–6167. doi:10.1109/TIFS.2024.3409083
-
[59]
VirusSamples Team. 2025. MacOS Malware Samples - A Collection of MacOS Malware Binaries. https://github.com/MalwareSamples/Macos-Malware- Samples Accessed: July 30, 2024
2025
-
[60]
VirusShare Team. 2025. VirusShare.com - Because Sharing is Caring. https: //virusshare.com Accessed: June 3, 2026
2025
-
[61]
VirusTotal. 2025. VirusTotal - Free Online Virus, Malware and URL Scanner. https://www.virustotal.com/ Accessed: June 3, 2026
2025
-
[62]
Elizabeth Walkup. 2014. Mac Malware Detection via Static File Structure Analysis. https://cs229.stanford.edu/proj2014/Elizabeth%20Walkup,%20MacMalware.pdf
2014
-
[63]
Liyuan Wang, Xingxing Zhang, Hang Su, and Jun Zhu. 2024. A Comprehensive Survey of Continual Learning: Theory, Method and Application.IEEE Trans- actions on Pattern Analysis and Machine Intelligence46, 8 (2024), 5362–5383. doi:10.1109/TPAMI.2024.3367329
-
[64]
2022.The Art of Mac Malware: The Guide to Analyzing Malicious Software
Patrick Wardle. 2022.The Art of Mac Malware: The Guide to Analyzing Malicious Software. No Starch Press. https://taomm.org/vol1/read.html
2022
-
[65]
2025.The Art of Mac Malware, Volume 2: Detecting Malicious Software
Patrick Wardle. 2025.The Art of Mac Malware, Volume 2: Detecting Malicious Software. No Starch Press. https://taomm.org/vol2/read.html
2025
-
[66]
George D Webster, Bojan Kolosnjaji, Christian von Pentz, Julian Kirsch, Zachary D Hanif, Apostolis Zarras, and Claudia Eckert. 2017. Finding the needle: A study of the pe32 rich header and respective malware triage. InDetection of Intrusions and Malware, and Vulnerability Assessment: 14th International Con- ference, DIMV A 2017, Bonn, Germany, July 6-7, 2...
2017
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.