pith. sign in

arxiv: 2606.28917 · v1 · pith:3N2I6PWLnew · submitted 2026-06-27 · 💻 cs.LG

ML-Powered LDAP Reconnaissance Detection using Weak Supervision

Pith reviewed 2026-06-30 09:56 UTC · model grok-4.3

classification 💻 cs.LG
keywords LDAP reconnaissanceweak supervisionActive Directorythreat detectionsignature miningmachine learning classifierendpoint correlation
0
0 comments X

The pith

Correlating LDAP queries with endpoint detections supplies weak labels that train a classifier to flag malicious reconnaissance at up to 65 percent true positive rate on holdout data.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper shows that weak supervision built from linking LDAP queries to known endpoint detections can label a large corpus without manual annotation. This labeled data trains a classifier that separates malicious from benign LDAP queries while keeping false positives low. A second statistical-mining step extracts concrete query signatures that security teams can deploy immediately. The approach trades some label noise for scale, making automated detection feasible where hand labeling would be too slow or costly. If the correlation step holds, defenders gain an early-warning layer against identity-based attacks that begin with directory reconnaissance.

Core claim

By treating endpoint detections as noisy but abundant labels, the authors construct a large training set of LDAP queries and train a classifier that reaches 65 percent true positive rate on a held-out set while constraining false positives. The same weak-supervision pipeline feeds a hypothesis-testing procedure that mines new malicious LDAP signatures; these signatures are validated at 81.48 percent field precision by operational security analysts.

What carries the argument

Weak supervision pipeline that correlates LDAP queries with endpoint detections to generate training labels, followed by statistical hypothesis testing to extract malicious query signatures.

If this is right

  • Security teams can label and train on orders of magnitude more LDAP data than manual methods allow.
  • The mined signatures can be turned into production detection rules without further model training.
  • Early reconnaissance activity becomes detectable before the attacker moves to later attack stages.
  • The same correlation technique can be reused for other directory or authentication protocols that lack labeled attack data.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • If endpoint detections themselves contain systematic bias, the learned classifier may overfit to whatever behaviors those detections already catch rather than discovering new reconnaissance patterns.
  • The method could be extended by adding a small amount of human review on the highest-uncertainty queries to reduce label noise without losing scale.
  • Similar weak-supervision pipelines might apply to other high-volume log sources such as DNS or PowerShell where ground-truth labels are scarce.

Load-bearing premise

Linking LDAP queries to endpoint detections produces labels whose noise and bias remain small enough that the resulting classifier and signatures still generalize.

What would settle it

Run the trained classifier on a fresh set of LDAP queries that have been independently labeled by human analysts and measure whether true positive rate falls substantially below 65 percent or signature precision falls substantially below 81 percent.

Figures

Figures reproduced from arXiv: 2606.28917 by Asaf Romano, Avraham Kama, Benjamin Malmberg, Dor Agron, Edward Raff, Michael Brautbar, Sagi Sheinfeld, Shaefer Drew, Yaron Zinar.

Figure 1
Figure 1. Figure 1: Identity Attack Kill Chain. This shows how threat [PITH_FULL_IMAGE:figures/full_fig_p002_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Label leakage example showing identical LDAP [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Grouped Splitting: A Process that splits the dataset [PITH_FULL_IMAGE:figures/full_fig_p004_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Sample LDAP ML Detection. This shows a sample [PITH_FULL_IMAGE:figures/full_fig_p005_4.png] view at source ↗
Figure 6
Figure 6. Figure 6: Example LDAP Model ROC Curve for Validation [PITH_FULL_IMAGE:figures/full_fig_p006_6.png] view at source ↗
read the original abstract

Lightweight Directory Access Protocol (LDAP) is a protocol that allows users to query and modify Active Directory (AD) data. By default, all users have read access to all AD data through LDAP, making it a common initial tool for reconnaissance when a threat actor first compromises an identity. To capture threat actors early in the reconnaissance phase, we developed two machine learning frameworks to detect LDAP reconnaissance: an ML classifier to predict malicious LDAP queries and an ML-based data-mining method to extract malicious query signatures. By correlating LDAP queries with endpoint detections, the first framework uses weak supervision to label a massive dataset and classify LDAP queries as malicious or benign. For immediate deployment, a second technique was developed on top of this approach to employ a rigorous statistical hypothesis-testing framework for mining novel, malicious LDAP signatures. While this weakly supervised approach is limited compared with manual human labeling, it is more practical for this use case because it leverages large-scale automated corpus construction, reducing costs and time. Ultimately, both the LDAP classifier and the ML-based LDAP signature mining method achieved performance benchmarks, with the classifier achieving up to a 65\% True Positive Rate (TPR) on the holdout set while limiting false positives, and mined signatures demonstrating 81.48\% field precision with CrowdStrike's Managed Detection and Response team.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 0 minor

Summary. The manuscript presents two ML frameworks for detecting LDAP reconnaissance in Active Directory: a classifier trained via weak supervision by correlating LDAP queries with external endpoint detections to label a large dataset (reporting up to 65% TPR on holdout while limiting false positives), and a statistical hypothesis-testing approach to mine malicious query signatures (reporting 81.48% field precision validated by CrowdStrike MDR). The work emphasizes the practicality of weak supervision for scalable, low-cost labeling over manual annotation.

Significance. If the weak labels are shown to be sufficiently accurate and unbiased, the approach could enable practical, large-scale early detection of identity reconnaissance without heavy manual effort, which is a genuine operational need in enterprise security. The real-world field validation of mined signatures is a concrete strength that supports deployability.

major comments (3)
  1. [Abstract] Abstract: The central performance claims (65% TPR for the classifier; 81.48% precision for signatures) rest on labels produced by correlating LDAP queries with endpoint detections. No correlation rule, label-error estimate, or expert validation of the resulting labels is described, so it is impossible to determine whether the reported metrics reflect genuine malicious patterns or artifacts of the labeling heuristic.
  2. [Methods/Results] Methods/Results: No details are supplied on model architecture, feature engineering for LDAP queries, holdout-set construction, or the exact statistical hypothesis tests and multiple-testing corrections used for signature mining. These omissions are load-bearing because they prevent assessment of whether the 65% TPR is reproducible or statistically supported.
  3. [Abstract/Results] Abstract/Results: The statement that weak supervision is 'limited compared with manual human labeling' but 'more practical' requires quantitative support on label quality; absent any reported validation (e.g., precision of the correlation step on a held-out expert-labeled subset), the performance numbers cannot be interpreted as evidence of effective detection.

Simulated Author's Rebuttal

3 responses · 1 unresolved

We thank the referee for the constructive and detailed feedback. We address each major comment point by point below, indicating where revisions will be made to the manuscript.

read point-by-point responses
  1. Referee: [Abstract] Abstract: The central performance claims (65% TPR for the classifier; 81.48% precision for signatures) rest on labels produced by correlating LDAP queries with endpoint detections. No correlation rule, label-error estimate, or expert validation of the resulting labels is described, so it is impossible to determine whether the reported metrics reflect genuine malicious patterns or artifacts of the labeling heuristic.

    Authors: We agree that the correlation rules and any assessment of label quality require more explicit description. In the revised manuscript we will add the exact correlation heuristic (including matching criteria between LDAP queries and endpoint detections), a discussion of observed label noise, and potential sources of bias. Expert validation on a held-out subset was not performed, as the scale of the data makes this impractical—the core motivation for weak supervision. We will note the 81.48% field precision on mined signatures as downstream validation of the pipeline. These additions will appear in the Methods section. revision: yes

  2. Referee: [Methods/Results] Methods/Results: No details are supplied on model architecture, feature engineering for LDAP queries, holdout-set construction, or the exact statistical hypothesis tests and multiple-testing corrections used for signature mining. These omissions are load-bearing because they prevent assessment of whether the 65% TPR is reproducible or statistically supported.

    Authors: We acknowledge that these methodological specifics were insufficiently detailed. The revised manuscript will include the model architecture, the complete feature engineering process applied to LDAP queries, the holdout-set construction procedure, and the precise statistical hypothesis tests together with the multiple-testing correction method used for signature mining. These additions will allow readers to evaluate reproducibility and statistical support for the reported metrics. revision: yes

  3. Referee: [Abstract/Results] Abstract/Results: The statement that weak supervision is 'limited compared with manual human labeling' but 'more practical' requires quantitative support on label quality; absent any reported validation (e.g., precision of the correlation step on a held-out expert-labeled subset), the performance numbers cannot be interpreted as evidence of effective detection.

    Authors: We agree the statement would be strengthened by quantitative context on label quality. However, an expert-labeled subset for direct precision estimation of the correlation step is not available, as creating one would require the manual effort the weak-supervision approach was designed to avoid. We will revise the abstract and discussion to qualify the claim more precisely, emphasize the independent 81.48% field precision on signatures, and report any available proxy indicators of label quality from our analysis. revision: partial

standing simulated objections not resolved
  • Direct quantitative validation of weak-label precision on an expert-labeled held-out subset (such a subset does not exist in our data).

Circularity Check

0 steps flagged

No circularity: labeling uses external endpoint detections; performance measured on independent holdout

full rationale

The paper constructs weak labels by correlating LDAP queries with endpoint detections from an external source and then trains an ML classifier on those labels, reporting TPR on a holdout set. This is a standard weak-supervision pipeline with no equations, self-citations, or fitted parameters that reduce the reported performance metric to the labeling rule by construction. The derivation chain remains self-contained against the external detections and does not invoke any of the enumerated circularity patterns.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central performance claims rest on the assumption that endpoint detections provide reliable proxy labels and that the statistical hypothesis tests identify truly novel malicious signatures; no free parameters, axioms, or invented entities are explicitly listed because only the abstract is available.

axioms (1)
  • domain assumption Endpoint detections accurately indicate malicious activity that can be reliably correlated with LDAP queries
    This supplies the weak labels for the entire training corpus.

pith-pipeline@v0.9.1-grok · 5787 in / 1114 out tokens · 25905 ms · 2026-06-30T09:56:07.569396+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

33 extracted references · 15 canonical work pages

  1. [1]

    Dirk-jan . 2018. BloodHound.py. https://github.com/dirkjanm/BloodHound.py Accessed 2026-02-05

  2. [2]

    FalconForce . 2024. SOAPHound. https://github.com/FalconForceTeam/SOAPH ound Accessed 2026-02-05

  3. [3]

    GhostPack . 2018. Rubeus. https://github.com/GhostPack/Rubeus Accessed 2026-02-05

  4. [4]

    SpecterOps . 2021. SharpHound. https://github.com/SpecterOps/SharpHound Accessed 2026-02-05

  5. [5]

    SpecterOps . 2023. BloodHound. https://github.com/SpecterOps/BloodHound Accessed 2026-02-05

  6. [6]

    Dor Agron, Michael Avraham Brautbar, Shaefer Drew, Avraham Kama, Asaf Lavi, Sagi Sheinfeld, and Yaron Zinar. 2025. Machine-Learned Suspicious Query Detection. US Patent App. 18/630,106

  7. [7]

    Sylvain Arlot and Alain Celisse. 2010. A survey of cross-validation procedures for model selection.Statistics Surveys4 (2010), 40–79

  8. [8]

    Prasasthy Balasubramanian, Tarek Ali, Mohammad Salmani, Danial KhoshKholgh, and Panos Kostakos. 2024. Hex2Sign: Automatic IDS Signature Generation from Hexadecimal Data using LLMs. In2024 IEEE International Confer- ence on Big Data (BigData). 4524–4532. doi:10.1109/BigData62323.2024.10825710 ISSN: 2573-2978

  9. [9]

    James Bergstra, Brent Komer, Chris Eliasmith, Dan Yamins, and David Cox. 2015. Hyperopt: A Python library for model selection and hyperparameter optimization. Computational Science & Discovery8 (07 2015), 014008. doi:10.1088/1749-4699/8 /1/014008

  10. [10]

    2008.Automatisierte Signaturgenerierung für Malware- Stämme

    Christian Blichmann. 2008.Automatisierte Signaturgenerierung für Malware- Stämme. Ph. D. Dissertation. Technical University of Dortmund

  11. [11]

    Carlo Bonferroni. 1936. Teoria statistica delle classi e calcolo delle probabilita. Pubblicazioni del R Istituto Superiore di Scienze Economiche e Commericiali di Firenze8 (1936), 3–62. KDD ’26, August 09–13, 2026, Jeju Island, Republic of Korea Shaefer Drew et al

  12. [12]

    Ju Chen, Jinghan Wang, Chengyu Song, and Heng Yin. 2022. JIGSAW: Efficient and Scalable Path Constraints Fuzzing. In2022 IEEE Symposium on Security and Privacy (SP). 18–35. doi:10.1109/SP46214.2022.9833796 ISSN: 2375-1207

  13. [13]

    Tianqi Chen. 2016. XGBoost: A Scalable Tree Boosting System.Cornell University (2016)

  14. [14]

    1988.Statistical Power Analysis for the Behavioral Sciences(2nd ed.)

    Jacob Cohen. 1988.Statistical Power Analysis for the Behavioral Sciences(2nd ed.). Lawrence Erlbaum Associates, Hillsdale, NJ

  15. [15]

    Shaefer Drew, Mickey Brautbar, and Yaron Zinar. 2025. Caught in the Act: CrowdStrike’s New ML-Powered LDAP Reconnaissance Detections. CrowdStrike Blog. https://www.crowdstrike.com/en-us/blog/inside-crowdstrike-ml- powered-ldap-reconnaissance-detections/ Accessed: 2025-12-01

  16. [16]

    Min Du, Wenjun Hu, and William Hewlett. 2021. AutoCombo: Automatic Mal- ware Signature Generation Through Combination Rule Mining. InProceedings of the 30th ACM International Conference on Information & Knowledge Manage- ment (CIKM ’21). Association for Computing Machinery, New York, NY, USA, 3777–3786. doi:10.1145/3459637.3481896

  17. [17]

    Friedman

    Jerome H. Friedman. 2001. Greedy function approximation: A gradient boosting machine.The Annals of Statistics29, 5 (2001), 1189 – 1232. doi:10.1214/aos/1013 203451

  18. [18]

    Sture Holm. 1979. A simple sequentially rejective multiple test procedure.Scan- dinavian Journal of Statistics6, 2 (1979), 65–70

  19. [19]

    Maya Kapoor, Garrett Fuchs, and Jonathan Quance. 2021. RExACtor: Automatic Regular Expression Signature Generation for Stateless Packet Inspection. In2021 IEEE 20th International Symposium on Network Computing and Applications (NCA). 1–9. doi:10.1109/NCA53618.2021.9685959 ISSN: 2643-7929

  20. [20]

    Hyang-Ah Kim and Brad Karp. 2004. Autograph: toward automated, distributed worm signature detection. InProceedings of the 13th conference on USENIX Security Symposium - Volume 13. 19. doi:10.1.1.94.5342

  21. [21]

    Shijia Li, Jiang Ming, Pengda Qiu, Qiyuan Chen, Lanqing Liu, Huaifeng Bao, Qiang Wang, and Chunfu Jia. 2023. PackGenome: Automatically Generating Robust YARA Rules for Accurate Malware Packer Detection. InProceedings of the 2023 ACM SIGSAC Conference on Computer and Communications Security (CCS ’23). Association for Computing Machinery, New York, NY, USA,...

  22. [22]

    James Newsome, Brad Karp, and Dawn Song. 2005. Polygraph: Automatically Generating Signatures for Polymorphic Worms. InProceedings of the 2005 IEEE Symposium on Security and Privacy. IEEE Computer Society, Washington, DC, USA, 226–241. doi:10.1109/SP.2005.15 Series Title: SP ’05

  23. [23]

    Pedregosa, G

    F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cour- napeau, M. Brucher, M. Perrot, and E. Duchesnay. 2011. Scikit-learn: Machine Learning in Python.Journal of Machine Learning Research12 (2011), 2825–2830

  24. [24]

    Sayan Putatunda and Dr Kiran R. 2018. A Comparative Analysis of Hyperopt as Against Other Approaches for Hyper-Parameter Optimization of XGBoost. SPML ’18: Proceedings of the 2018 International Conference on Signal Processing and Machine Learning, 6–10. doi:10.1145/3297067.3297080

  25. [25]

    An- derson, Bobby Filar, Charles Nicholas, and James Holt

    Edward Raff, Richard Zak, Gary Lopez Munoz, William Fleming, Hyrum S. An- derson, Bobby Filar, Charles Nicholas, and James Holt. 2020. Automatic Yara Rule Generation Using Biclustering. In13th ACM Workshop on Artificial Intelligence and Security (AISec’20). doi:10.1145/3411508.3421372 arXiv: 2009.03779

  26. [26]

    M Zubair Rafique and Juan Caballero. 2013. FIRMA: Malware Clustering and Network Signature Generation with Mixed Network Behaviors. InProceedings of the 16th International Symposium on Research in Attacks, Intrusions, and Defenses - Volume 8145. Springer-Verlag New York, Inc., New York, NY, USA, 144–163. doi:10.1007/978-3-642-41284-4_8 Series Title: RAID 2013

  27. [27]

    H., Ehrenberg, H., Fries, J., Wu, S., and Ré, C

    Alexander Ratner, Stephen H. Bach, Henry Ehrenberg, Jason Fries, Sen Wu, and Christopher Ré. 2017. Snorkel: rapid training data creation with weak supervision.Proceedings of the VLDB Endowment11, 3 (Nov. 2017), 269–282. doi:10.14778/3157794.3157797

  28. [28]

    Team Snorkel. 2022. Weak Supervision Modeling with Fred Sala. https://snorke l.ai/blog/weak-supervision-modeling/

  29. [29]

    Shobha Venkataraman, Avrim Blum, and Dawn Song. 2008. Limits of Learning- based Signature Generation with Adversaries. InNDSS. http://repository.cmu.e du/ece

  30. [30]

    1993.Resampling-based multiple testing: Examples and methods for p-value adjustment

    Peter H Westfall and S Stanley Young. 1993.Resampling-based multiple testing: Examples and methods for p-value adjustment. John Wiley & Sons

  31. [31]

    Edwin B. Wilson. 1927. Probable Inference, the Law of Succession, and Statistical Inference.J. Amer. Statist. Assoc.22, 158 (1927), 209–212. doi:10.1080/01621459.1 927.10502953

  32. [32]

    Vinod Yegneswaran, Jonathon T Giffin, Paul Barford, and Somesh Jha. 2005. An Architecture for Generating Semantics-aware Signatures. InProceedings of the 14th Conference on USENIX Security Symposium - Volume 14. USENIX Association, Berkeley, CA, USA, 7. http://dl.acm.org/citation.cfm?id=1251398.1251405 Series Title: SSYM’05

  33. [33]

    DC=example,DC=com

    XiangRui Zhang, XueJie Du, HaoYu Chen, Yongzhong He, Wenjia Niu, and Qiang Li. 2025. Automatically Generating Rules of Malicious Software Packages via Large Language Model. In2025 55th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN). 734–747. doi:10.1109/DSN64029.2025. 00072 ISSN: 2158-3927. A LDAP Tutorial The dataset c...