pith. machine review for the scientific record. sign in

arxiv: 2604.15740 · v1 · submitted 2026-04-17 · 💻 cs.CY

Recognition: unknown

Evidence Sufficiency Under Delayed Ground Truth: Proxy Monitoring for Risk Decision Systems

Authors on Pith no claims yet

Pith reviewed 2026-05-10 08:14 UTC · model grok-4.3

classification 💻 cs.CY
keywords evidence sufficiencydelayed ground truthproxy monitoringdrift detectionrisk decision systemsgovernancemachine learningfraud detection
0
0 comments X

The pith

Delayed outcome labels degrade evidence quality in four measurable dimensions that proxy indicators can monitor without waiting for results.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Machine learning risk systems in fraud, credit, and clinical domains must decide before outcome labels arrive, leaving a blind period where evidence quality erodes. The paper defines evidence sufficiency through four dimensions and shows how three kinds of drift drive distinct degradation paths in those dimensions. It supplies a proxy monitoring system built from seven unlabeled measurement categories that estimates the current sufficiency level and flags which drift types remain invisible. Experiments on a large fraud dataset confirm that the proxies catch covariate and mixed drift at 100 percent while missing pure concept drift, and that sufficiency scores fall steadily over time with concept drift causing the steepest drop. The result is a governance instrument that turns drift signals into auditable readiness assessments whose blind spots are explicitly mapped.

Core claim

The paper formalizes an evidence sufficiency model with four dimensions—completeness, freshness, reliability, and representativeness—plus a decision-readiness gate that quantifies how label latency degrades evidence. It maps three drift types to dimension-specific degradation trajectories and introduces a proxy indicator framework of seven measurement categories that estimates sufficiency loss without labels, together with coverage mappings and per-drift blind spots. On the IEEE-CIS Fraud Detection dataset with controlled drift injection, the composite proxy score detects covariate and mixed drift at 100 percent while concept drift without feature change remains undetected, matching the fact

What carries the argument

The four-dimension evidence sufficiency model together with the seven-category proxy indicator framework that converts unlabeled measurements into drift-specific degradation trajectories and auditable readiness scores.

If this is right

  • Governance teams can set explicit thresholds for when a risk system remains decision-ready despite pending labels.
  • Drift alerts can be converted into dimension-specific sufficiency reports rather than generic warnings.
  • The framework reveals which drift types require labeled data because they are invisible to unsupervised proxies.
  • Blind-spot mappings allow organizations to add targeted supplementary checks for specific drift patterns.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same proxy approach could be adapted to other delayed-feedback domains such as medical treatment outcomes or long-horizon forecasting.
  • Calibration of sufficiency thresholds to each deployment's risk tolerance would be required before operational use.
  • Combining the proxies with occasional minimal label sampling could shrink the blind spot for concept drift without full retraining.
  • The model offers a way to make existing governance frameworks more sensitive to the timing of evidence rather than only to model accuracy.

Load-bearing premise

The seven proxy categories can accurately estimate degradation in all four sufficiency dimensions without access to outcome labels and the drift-to-trajectory mappings match real degradation mechanisms.

What would settle it

A deployment in which the proxy-derived sufficiency score fails to track the actual drop in decision quality once the delayed labels finally arrive, or in which concept drift produces undetected performance loss beyond the theoretical limit.

read the original abstract

Machine learning systems in fraud detection, credit scoring, and clinical risk assessment operate under delayed ground truth: outcome labels arrive days to months after the decision they evaluate. During this blind period, governance evidence degrades through mechanisms that neither drift detection methods nor governance frameworks adequately address. This paper formalizes an evidence sufficiency model with four dimensions (completeness, freshness, reliability, representativeness) and a decision-readiness gate that quantifies how label latency degrades evidence quality. The model maps three drift types to dimension-specific degradation trajectories. A complementary proxy indicator framework comprising seven measurement categories estimates sufficiency degradation without labels, with explicit coverage mapping and characterized blind spots per drift type. Evaluation on the IEEE-CIS Fraud Detection dataset (~590K transactions) with controlled drift injection shows that composite proxy monitoring detects covariate and mixed drift with 100% detection rate, while concept drift without feature change remains undetected -- consistent with the theoretical impossibility of unsupervised detection when P(X) is unchanged. Blind period simulation confirms monotone sufficiency degradation, with concept drift degrading fastest (S=0.242 at day 60 vs 0.418 for no-drift). The framework contributes a governance sufficiency monitoring instrument; its value lies in translating drift signals into auditable sufficiency assessments with characterized blind spots. Mapping sufficiency levels to governance actions requires deployment-specific calibration beyond this study's scope.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper formalizes an evidence sufficiency model with four dimensions (completeness, freshness, reliability, representativeness) and a decision-readiness gate for ML risk systems under delayed ground truth. It maps three drift types to dimension-specific degradation trajectories and introduces a proxy indicator framework with seven measurement categories to estimate sufficiency degradation without labels. Evaluation on the IEEE-CIS Fraud Detection dataset (~590K transactions) with controlled drift injection reports 100% detection for covariate and mixed drift, zero detection for concept drift without feature change, and monotone sufficiency degradation in blind-period simulations (e.g., S=0.242 for concept drift vs. 0.418 for no-drift at day 60). The framework is positioned as a governance monitoring instrument with characterized blind spots.

Significance. If the proxy-to-sufficiency mapping holds, the work provides a useful governance instrument for high-stakes domains by translating drift signals into auditable sufficiency assessments during label latency, with explicit coverage and blind spots. Strengths include the public dataset, controlled drift injection experiments, and reproducible detection rates consistent with theoretical expectations for unsupervised settings.

major comments (2)
  1. [Evaluation on the IEEE-CIS Fraud Detection dataset] Evaluation section: the reported 100% proxy-based detection for covariate/mixed drift and monotone S degradation (S=0.242 vs. 0.418 at day 60) do not include quantitative validation (e.g., correlation, recovery error, or dimension-wise alignment metrics) showing that the seven proxy categories recover or track the four-dimension sufficiency degradation trajectories in the absence of labels. This leaves the central claim that proxies estimate sufficiency degradation as an untested modeling assumption.
  2. [Proxy indicator framework] Proxy indicator framework: the explicit coverage mapping from the seven proxy measurement categories to the four sufficiency dimensions is presented, but the blind-period simulation and drift-injection results do not test whether these proxies produce estimates that correlate with the simulated degradation levels across completeness, freshness, reliability, and representativeness when ground truth is unavailable.
minor comments (2)
  1. [Abstract] Abstract: lacks details on the exact definitions or formulas for the seven proxy categories and the decision-readiness gate.
  2. [Blind period simulation] Blind period simulation: clarify the aggregation formula used to compute the composite sufficiency score S from the four dimensions.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments on our manuscript. The feedback correctly identifies a gap in the empirical validation of the proxy-to-sufficiency mapping. We address each major comment below and will revise the manuscript accordingly.

read point-by-point responses
  1. Referee: [Evaluation on the IEEE-CIS Fraud Detection dataset] Evaluation section: the reported 100% proxy-based detection for covariate/mixed drift and monotone S degradation (S=0.242 vs. 0.418 at day 60) do not include quantitative validation (e.g., correlation, recovery error, or dimension-wise alignment metrics) showing that the seven proxy categories recover or track the four-dimension sufficiency degradation trajectories in the absence of labels. This leaves the central claim that proxies estimate sufficiency degradation as an untested modeling assumption.

    Authors: We agree that the current evaluation relies on overall detection rates and aggregate S trajectories without reporting direct quantitative alignment between the seven proxy categories and the four dimension-specific degradation levels. The proxy mapping was constructed from definitional coverage (e.g., volume and missingness proxies for completeness), and the controlled experiments confirm expected detection behavior, but this does not substitute for explicit correlation or recovery metrics. In the revised manuscript we will add, in the Evaluation section, dimension-wise proxy scores, Pearson correlations with simulated degradation per dimension, and mean absolute recovery error for the blind-period simulations on the IEEE-CIS dataset. revision: yes

  2. Referee: [Proxy indicator framework] Proxy indicator framework: the explicit coverage mapping from the seven proxy measurement categories to the four sufficiency dimensions is presented, but the blind-period simulation and drift-injection results do not test whether these proxies produce estimates that correlate with the simulated degradation levels across completeness, freshness, reliability, and representativeness when ground truth is unavailable.

    Authors: The referee is correct that the reported results emphasize composite detection and monotone aggregate S degradation rather than per-dimension correlation tests under label latency. While the coverage mapping is explicit in the framework section, the empirical link to simulated trajectories was not quantified. We will revise the blind-period simulation subsection to include these tests: time-series correlations between each proxy category and the corresponding dimension degradation, plus cross-dimension alignment statistics, using the same controlled drift injections. This will provide direct evidence that the proxies track sufficiency degradation when ground truth is unavailable. revision: yes

Circularity Check

0 steps flagged

No circularity: definitions and empirical measurements remain independent

full rationale

The paper introduces an evidence sufficiency model with four dimensions and a proxy indicator framework with seven categories as independent constructs, then evaluates them via controlled drift injection on the external IEEE-CIS Fraud Detection dataset. Reported outcomes (100% detection for covariate/mixed drift, S values at day 60, blind spots for concept drift) are measured simulation results rather than quantities recovered by construction from fitted parameters or self-referential mappings. No load-bearing self-citations, uniqueness theorems, or ansatzes imported from prior author work appear in the derivation; the mapping from proxies to dimensions is presented as an explicit modeling choice with characterized coverage, not as a tautology. The central claims therefore rest on external data and stated assumptions rather than reducing to the inputs by definition.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 2 invented entities

The central claim rests on newly introduced conceptual constructs (the four-dimensional model and proxy framework) whose validity is supported only by the described simulation and dataset evaluation; no external benchmarks or independent derivations are referenced.

axioms (2)
  • domain assumption The four dimensions of completeness, freshness, reliability, and representativeness adequately capture all mechanisms by which label latency degrades evidence quality.
    Invoked when formalizing the evidence sufficiency model and decision-readiness gate.
  • domain assumption Proxy indicators drawn from seven measurement categories can estimate sufficiency degradation without access to delayed labels.
    Basis for the complementary proxy indicator framework and its coverage mapping.
invented entities (2)
  • Evidence sufficiency model with decision-readiness gate no independent evidence
    purpose: Quantifies how label latency degrades evidence quality across four dimensions
    Newly formalized construct; no independent evidence or prior literature reference provided.
  • Proxy indicator framework with seven measurement categories no independent evidence
    purpose: Estimates sufficiency degradation without labels and maps blind spots per drift type
    Introduced as complementary monitoring instrument; no external validation cited.

pith-pipeline@v0.9.0 · 5531 in / 1629 out tokens · 42704 ms · 2026-05-10T08:14:31.417270+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. Label-Free Detection of Governance Evidence Degradation in Risk Decision Systems

    cs.CY 2026-04 unverdicted novelty 6.0

    A composite multi-proxy framework detects harmful drift in label-free risk decision systems and enables graduated governance alerts.

Reference graph

Works this paper leans on

47 extracted references · 43 canonical work pages · cited by 1 Pith paper · 1 internal anchor

  1. [1]

    Ackerman, S., Dube, P., Farchi, E., Raz, O., & Zalmanovici, M. (2021). Machine Learning Model Drift Detection Via Weak Data Slices. In Workshop on Deep Learning for Testing and Testing for Deep Learning (pp. 1–7). https://doi.org/10.1109/DeepTest52559.2021.00007

  2. [2]

    Agrahari, S., & Singh, A. (2021). Concept Drift Detection in Data Stream Mining : A literature review. Journal of King Saud University: Computer and Information Sciences , 34(10), 9523–9540. https://doi.org/10.1016/j.jksuci.2021.11.006

  3. [3]

    Amekoe, K.M., Lebbah, M., Jaffre, G., Azzag, H., & Dagdia, Z.C. (2024). Evaluating the Efficacy of Instance Incremental vs. Batch Learning in Delayed Label Environments: An Empirical Study on Tabular Data Streaming for Fraud Detection. arXiv.org [Preprint]. https://doi.org/10.48550/arXiv.2409.10111

  4. [4]

    Amoukou, S.I., Bewley, T., Mishra, S., Lécué, F., Magazzeni, D., & Veloso, M. (2024). Sequential Harmful Shift Detection Without Labels. Neural Information Processing Systems [Preprint]. https://doi.org/10.48550/arXiv.2412.12910

  5. [5]

    Baier, L., Schlör, T., Schöffer, J., & Kühl, N. (2021). Detecting Concept Drift With Neural Network Model Uncertainty. In Hawaii International Conference on System Sciences . https: //doi.org/10.24251/hicss.2023.104

  6. [6]

    Bayram, F., Ahmed, B.S., & Kassler, A. (2022). From Concept Drift to Model Degradation: An Overview on Performance-Aware Drift Detectors. Knowledge-Based Systems [Preprint]. https://doi.org/10.48550/arXiv.2203.11070 Board of Governors of the Federal Reserve System; Office of the Comptroller of the Currency (2011). SR 11-7: Guidance on Model Risk Management...

  7. [7]

    Butt, T., Iqbal, M., & Arshad, N. (2026). From Policy to Pipeline: A Governance Framework for AI Development and Operations Pipelines. IEEE Access, 14, 1–27. https://doi.org/10.1 109/ACCESS.2025.3647479

  8. [8]

    Casimiro, M., Soares, D., Garlan, D., Rodrigues, L., & Romano, P. (2024). Self-adapting Machine Learning-based Systems via a Probabilistic Model Checking Framework. ACM Transactions on Autonomous and Adaptive Systems , 19(3), 1–30. https://doi.org/10.1145/ 3648682

  9. [9]

    Chen, L., Zaharia, M., & Zou, J.Y. (2022). Estimating and Explaining Model Performance When Both Covariates and Labels Shift. Neural Information Processing Systems [Preprint]. https://doi.org/10.48550/arXiv.2209.08436

  10. [10]

    Daruna, S. (2026). Human-in-the-Loop Frameworks in Automated Decision Systems: A Systematic Analysis of Design Patterns, Performance Characteristics, and Deployment Con- siderations. The American Journal of Engineering and Technology , 8(02), 17–25. https: //doi.org/10.37547/tajet/volume08issue02-03

  11. [11]

    Elder, B., Arnold, M., Murthi, A., & Navrátil, J. (2020). Learning Prediction Intervals for Model Performance. In AAAI Conference on Artificial Intelligence (pp. 7305–7313). https://doi.org/10.1609/aaai.v35i8.16897 European Parliament; Council of the European Union (2024). Regulation (EU) 2024/1689 laying down harmonised rules on artificial intelligence (...

  12. [12]

    Gama, J., Medas, P., Castillo, G., & Rodrigues, P. (2004). Learning with Drift Detection. In Brazilian Symposium on Artificial Intelligence (pp. 0–4). https://doi.org/10.1007/978-3- 540-28645-5_29

  13. [13]

    Garg, S., Balakrishnan, S., Lipton, Z.C., Neyshabur, B., & Sedghi, H. (2022). Leveraging Unlabeled Data to Predict Out-of-Distribution Performance . https://arxiv.org/abs/2201.042 34

  14. [14]

    Gauthier, E., Bach, F., & Jordan, M.I. (2025). E-Values Expand the Scope of Conformal Prediction. arXiv.org [Preprint]. https://doi.org/10.48550/arXiv.2503.13050

  15. [15]

    Gemaque, R.N., Costa, A., Giusti, R., & Santos, E. (2020). An overview of unsupervised drift detection methods. WIREs Data Mining Knowl. Discov. , 10(6), e1381–e1381. https: //doi.org/10.1002/widm.1381

  16. [16]

    Greco, S., Vacchetti, B., Apiletti, D., & Cerquitelli, T. (2024). Unsupervised Concept Drift Detection From Deep Learning Representations in Real-Time. IEEE Transactions on Knowl- edge and Data Engineering , 37 (10), 6232–6245. https://doi.org/10.1109/TKDE.2025.3593 123 Grünwald, P., Henzi, A., & Lardy, T. (2022). Anytime-Valid Tests of Conditional Indepe...

  17. [17]

    Guerdan, L., Coston, A., Wu, Z.S., & Holstein, K. (2023). Ground(less) Truth: A Causal Framework for Proxy Labels in Human-Algorithm Decision-Making. In Conference on Fair- ness, Accountability and Transparency (pp. 688–704). https://doi.org/10.1145/3593013.35 94036

  18. [18]

    Gao, H., Ding, Z., & Pan, M. (2022). Incremental Learning Method for Data with Delayed Labels. Computing and informatics , 41(5), 1260–1283. https://doi.org/10.31577/cai_202 2_5_1260

  19. [19]

    Hinder, F., Vaquet, V., & Hammer, B. (2024). One or two things we know about concept drift—a survey on monitoring in evolving environments. Part A: detecting concept drift. Frontiers Artif. Intell. , 7, 2–13. https://doi.org/10.3389/frai.2024.1330257 Kivimäki, J., Bialek, J., Nurminen, J., & Kuberski, W. (2024). Confidence-based Estimators for Predictive ...

  20. [20]

    Koebler, A., Decker, T., Thon, I., Tresp, V., & Buettner, F. (2025). Incremental Uncertainty- aware Performance Monitoring with Active Labeling Intervention. In International Confer- ence on Artificial Intelligence and Statistics (pp. 0–6). https://doi.org/10.48550/arXiv.250 5.07023

  21. [21]

    Shi, W. (2019). Addressing delayed feedback for continuous training with neural networks in CTR prediction. In ACM Conference on Recommender Systems (pp. 187–195). https: //doi.org/10.1145/3298689.3347002

  22. [22]

    Kurshan, E., Shen, H., & Chen, J. (2020). Towards self-regulating AI. https://doi.org/10.1 145/3383455.3422564

  23. [23]

    Lebichot, B., Borgne, Y.-A.L., & Bontempi, G. (2025). PRODEM: A Meta-Model Approach for Performance Degradation Detection in Credit Card Fraud Systems . https://ecmlpkdd.o rg/2025/

  24. [24]

    Lee, J., Woo, J., Moon, H., & Lee, K. (2023). Unsupervised Accuracy Estimation of Deep Visual Models using Domain-Adaptive Adversarial Perturbation without Source Samples. In IEEE International Conference on Computer Vision (pp. 1–9). https://doi.org/10.1109/IC CV51070.2023.01507

  25. [25]

    Lu, J., Liu, A., Dong, F., Gu, F., Gama, J., & Zhang, G. (2019). Learning under Concept Drift: A Review . https://doi.org/10.1016/j.knosys.2018.10.035

  26. [26]

    Lukats, D., Zielinski, O., Hahn, A., & Stahl, F.T. (2024). A benchmark and survey of fully unsupervised concept drift detectors on real-world data streams. International Journal of Data Science and Analysis , 19(1), 1–31. https://doi.org/10.1007/s41060-024-00620-y

  27. [27]

    Mahdi, O.A., Pardede, E., Ali, N., & Cao, J. (2020). Fast Reaction to Sudden Concept Drift in the Absence of Class Labels. Applied Sciences, 10(2), 1–16. https://doi.org/10.3390/ap p10020606

  28. [28]

    McGregor, S., & Hostetler, J. (2023). Data-Centric Governance. arXiv.org [Preprint]. https: //doi.org/10.48550/arXiv.2302.07872

  29. [29]

    Morse, K., Brown, C., Fleming, S., Todd, I., Powell, A., Russell, A., Scheinker, D., Suther- land, S., Lu, J., Watkins, B., Shah, N., Pageler, N.M., & Palma, J. (2022). Monitoring Approaches for a Pediatric Chronic Kidney Disease Machine Learning Model. Applied Clin- ical Informatics , 13(02), 431–438. https://doi.org/10.1055/s-0042-1746168

  30. [30]

    Muhammad, A.E., Yow, K., & Alsenan, S.A. (2026). Audit-as-code: a policy-as-code framework for continuous AI assurance. In Frontiers in Artificial Intelligence (pp. 0–16). https://doi.org/10.3389/frai.2026.1759211 Mökander, J., & Axente, M. (2021a). Ethics-based auditing of automated decision-making systems: intervention points and policy implications. AI...

  31. [31]

    Nguyen, V., Shui, C., Giri, V., Arya, S., Verma, A., Razak, F., & Krishnan, R.G. (2025). Reliably detecting model failures in deployment without labels. arXiv.org [Preprint]. https: //doi.org/10.48550/arXiv.2506.05047

  32. [32]

    Nwaodike, C. (2022). Establishing evidence-driven AI risk governance systems to prevent opaque decision-making in Critical Public Services across Global Jurisdictions. International journal of computing and artificial intelligence , 3(2), 130–140. https://doi.org/10.33545/2 7076571.2022.v3.i2a.245

  33. [33]

    Pozzolo, A.D., Boracchi, G., Caelen, O., Alippi, C., & Bontempi, G. (2015). Credit card fraud detection and concept-drift adaptation with delayed supervised information. In IEEE International Joint Conference on Neural Network (pp. 2–7). https://doi.org/10.1109/IJCN N.2015.7280527

  34. [34]

    Prinster, D., Han, X., Liu, A., & Saria, S. (2025). W ATCH: Adaptive Monitoring for AI Deployments via Weighted-Conformal Martingales. In International Conference on Machine Learning (pp. 1–30). https://doi.org/10.48550/arXiv.2505.04608

  35. [35]

    Rabanser, S., Günnemann, S., & Lipton, Z.C. (2018). Failing Loudly: An Empirical Study of Methods for Detecting Dataset Shift. Neural Information Processing Systems [Preprint]. https://doi.org/10.48550/arXiv.1810.11953

  36. [36]

    Ramdas, A., Ruf, J., Larsson, M., & Koolen, W.M. (2020). Admissible anytime-valid sequen- tial inference must rely on nonnegative martingales. [Preprint]. https://doi.org/10.48550/a rXiv.2009.03167

  37. [38]

    Ramdas, A., Barber, R.F., Candès, E.J., & Tibshirani, R.J. (2022b). Testing Exchangeabil- ity: Fork-Convexity, Supermartingales and E-Processes. [Preprint]. https://doi.org/10.485 50/arXiv.2103.06476

  38. [39]

    Sethi, T.S., & Kantardzic, M. (2017). On the reliable detection of concept drift from stream- ing unlabeled data. Expert systems with applications , 82, 1–29. https://doi.org/10.1016/j. eswa.2017.04.008

  39. [40]

    Simonetto, T., Cordy, M., Ghamizi, S., Traon, Y.L., Lefebvre, C., Boystov, A., & Goujon, A. (2024). On the Impact of Industrial Delays when Mitigating Distribution Drifts: An Empirical Study on Real-World Financial Systems. In Delta (pp. 1–17). https://doi.org/10 .1007/978-3-031-82346-6_4

  40. [41]

    Solozobov, O. (2026a). Distinguishing Governance from Compliance Evidence: A Framework for Post-Incident Reconstruction. Social Science Research Network [Preprint]. https://doi. org/10.2139/ssrn.6457861

  41. [42]

    Solozobov, O. (2026b). Evidence Sufficiency Calculator . https://doi.org/10.5281/zenodo.1 9233931

  42. [43]

    Solozobov, O. (2026c). Governance Drift Toolkit . https://doi.org/10.5281/zenodo.19236418

  43. [44]

    Souza, V., Silva, T.P.D., & Batista, G.E.A.P.A. (2018). Evaluating Stream Classifiers with Delayed Labels Information. In Brazilian Conference on Intelligent Systems (pp. 1260–1283). https://doi.org/10.1109/BRACIS.2018.00077

  44. [45]

    Sudjianto, A., & Zhang, A. (2024). Model Validation Practice in Banking: A Structured Approach for Predictive Models. arXiv preprint [Preprint]. https://doi.org/10.48550/arXiv .2410.13877

  45. [46]

    Tan, N., Shih, Y.-C., Yang, D., & Salunkhe, A. (2025). Flexible and Efficient Drift Detection without Labels. arXiv.org [Preprint]. https://doi.org/10.48550/arXiv.2506.08734

  46. [47]

    Vasilieva, I., & Petrov, O. (2025). An Empirical Survey of Fully Unsupervised Drift Detection Algorithms for Data Streams. International journal of data science and machine learning , 05(01), 20–28. https://doi.org/10.55640/ijdsml-05-01-05

  47. [48]

    Webb, G.I., Hyde, R., Cao, H., Nguyen, H.L., & Petitjean, F. (2016). Characterizing concept drift. https://doi.org/10.1007/s10618-015-0448-4 Žliobaitė, I. (2010). Change with Delayed Labeling: When is it Detectable? In 2010 IEEE International Conference on Data Mining Workshops (pp. 0–6). https://doi.org/10.1109/IC DMW.2010.49