pith. sign in

arxiv: 2606.11769 · v1 · pith:HFBBNW2Dnew · submitted 2026-06-10 · 💻 cs.AI · cs.LG

When Do Data-Driven Systems Exhibit the Capability to Infer?

Pith reviewed 2026-06-27 09:59 UTC · model grok-4.3

classification 💻 cs.AI cs.LG
keywords AI Actinference capabilitycredit scoringdata processing workflowstatistical modelshigh-risk AIregulatory frameworkstatistical learning theory
0
0 comments X

The pith

The capability to infer under the AI Act must be assessed across the entire data processing workflow rather than the model alone.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper creates a framework that assigns levels of inference capability to data-driven systems based on statistical learning theory. It applies the framework to realistic credit scoring workflows to determine when they meet the AI definition in the European AI Act. The analysis shows that inference capability can appear in systems built from statistical models depending on workflow steps and that human experts' involvement during development can raise or lower that capability. A sympathetic reader would care because credit scoring is listed as high-risk in Annex III, so the framework clarifies which implementations trigger the Act's obligations. The work also identifies areas where the Act needs more regulatory clarity on borderline cases.

Core claim

The framework grades inference capability into levels motivated by statistical learning theory; when applied to credit scoring, it establishes that the full data processing workflow determines whether sufficient inference occurs to qualify as an AI system under the AI Act, and that the degree of human expert involvement during development can significantly alter the assigned level.

What carries the argument

A multi-level grading framework for inference capability that evaluates how much a workflow goes beyond fixed statistical models by incorporating data-driven adaptation or inference steps.

If this is right

  • Credit scoring systems can exhibit inference capability even when built from statistical models if the workflow contains data-driven elements.
  • Human expert involvement in development or feature engineering can reduce a system's assigned inference level.
  • The entire pipeline, including preprocessing and postprocessing, must be examined rather than the model in isolation.
  • Further regulatory guidance is needed for systems whose inference level falls in a gray area between the defined thresholds.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Developers could structure workflows to keep inference below the Act's threshold and thereby avoid high-risk obligations.
  • The same grading approach could be tested on other Annex III domains such as employment screening or insurance pricing.
  • Purely statistical systems might perform similar functions to AI systems yet fall outside the Act's scope if the workflow is designed accordingly.

Load-bearing premise

The levels of inference capability defined in the framework align with the legal interpretation of inference under the AI Act and Commission Guidelines.

What would settle it

A concrete EU regulatory decision or court ruling on whether a credit scoring workflow that uses only non-adaptive statistical models without data-driven elements in any stage counts as having inference capability.

Figures

Figures reproduced from arXiv: 2606.11769 by Maximilian Poretschkin, Tabea Naeven.

Figure 1
Figure 1. Figure 1: Decision tree binning for the feature N30-59Late. Node labels show split criteria, and (rounded) sample propor￾tions, class distributions in the format value = [good, bad], and entropy values for each bin. Note that this is a simple tree with only three bins for illustration, and more complex trees are constructed in Example I. 4.2.2 Example II: Manual credit scorecard construction. After data preprocessin… view at source ↗
read the original abstract

The European AI Act is the first comprehensive regulation of artificial intelligence (AI), setting out extensive obligations, particularly for so-called high-risk and general-purpose AI systems. A key distinguishing feature of AI systems under the AI Act is the capability to infer. Since the AI Act does not clearly define what inference is, there is a gray area for certain data-driven systems. A specific example is credit scoring systems, which are listed by Annex III of the AI Act. At the same time, however, these are often implemented using statistical models for which it is unclear whether they have the capability to infer and thus fall under the AI definition of the AI Act at all. Motivated by statistical learning theory, this work develops a framework for grading different levels of the capability to infer. Based on the AI Act and the Commission Guidelines on the definition of an artificial intelligence system, we analyze which levels constitute sufficient capability to infer within the meaning of the AI Act and where further regulatory clarity is needed. We illustrate the framework by creating two realistic credit scoring workflows and show whether and where inference occurs in them. Our analysis illustrates that not only individual models but the entire data processing workflow must be considered. It also shows that the involvement of human experts during development can have significant influence on the capability to infer. Code can be found at https://github.com/fraunhofer-iais/inference-framework-creditscorecards.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper develops a framework, motivated by statistical learning theory, for grading levels of 'capability to infer' in data-driven systems. It analyzes this against the EU AI Act's definition of AI systems, particularly for credit scoring workflows listed in Annex III. Through two realistic credit scoring examples, it concludes that the entire data processing workflow must be considered, not just individual models, and that human expert involvement during development significantly influences whether the system exhibits inference capability. Code is provided for reproducibility.

Significance. If the proposed grading levels accurately capture the legal notion of inference under the AI Act, this work offers a practical tool for assessing regulatory obligations for statistical models in high-risk areas like credit scoring. It emphasizes workflow-level analysis and the role of human experts, which could inform both developers and regulators. The provision of concrete workflows and open code strengthens its utility for the community.

major comments (2)
  1. [§4] §4 (Mapping to the AI Act and Commission Guidelines): the claim that framework levels 3+ constitute sufficient 'capability to infer' under the AI Act rests on the authors' interpretive reading of the Act and Guidelines; no formal legal derivation, external expert validation, or cross-check against enforcement practice is provided, yet this mapping is load-bearing for the regulatory conclusions about credit scoring workflows.
  2. [§5.2] §5.2 (Credit Scoring Workflow 2): the analysis concludes that human expert involvement during feature engineering reduces inference capability below the AI Act threshold, but the grading criteria for 'human involvement' are not operationalized with measurable thresholds, making the workflow-level claim difficult to verify or replicate.
minor comments (2)
  1. [Abstract] The abstract states the motivation and high-level approach but provides no details on framework construction or grading criteria; expanding the abstract to include one sentence on the levels would improve accessibility.
  2. [Table 1] Table 1 (Framework Levels): the distinction between levels 2 and 3 uses terms from statistical learning theory without explicit cross-reference to the specific theorems or definitions invoked, which could be clarified with a short footnote.

Simulated Author's Rebuttal

2 responses · 1 unresolved

Thank you for the opportunity to respond to the referee's comments. We appreciate the detailed feedback and address each major comment point by point below, indicating where revisions will be made to the manuscript.

read point-by-point responses
  1. Referee: [§4] §4 (Mapping to the AI Act and Commission Guidelines): the claim that framework levels 3+ constitute sufficient 'capability to infer' under the AI Act rests on the authors' interpretive reading of the Act and Guidelines; no formal legal derivation, external expert validation, or cross-check against enforcement practice is provided, yet this mapping is load-bearing for the regulatory conclusions about credit scoring workflows.

    Authors: We concur that the mapping from our framework levels to the AI Act's notion of inference capability is based on our interpretive reading of the Act and the accompanying Commission Guidelines. As the manuscript is primarily a technical contribution motivated by statistical learning theory, we did not include a formal legal derivation or seek external legal validation. We will revise the text in §4 to more clearly articulate that this mapping represents our technical interpretation informed by the regulatory documents, and to note the need for further regulatory clarity as already mentioned in the paper. We cannot, however, provide a formal legal analysis or cross-check against enforcement practice within the scope of this work. revision: partial

  2. Referee: [§5.2] §5.2 (Credit Scoring Workflow 2): the analysis concludes that human expert involvement during feature engineering reduces inference capability below the AI Act threshold, but the grading criteria for 'human involvement' are not operationalized with measurable thresholds, making the workflow-level claim difficult to verify or replicate.

    Authors: We agree that the criteria for assessing human involvement lack measurable thresholds, which limits the replicability of the analysis in §5.2. In the revised manuscript, we will operationalize these criteria by providing specific examples and qualitative thresholds for what constitutes 'significant' human involvement in feature engineering, drawing from the credit scoring workflow examples. This will include clearer distinctions between levels of involvement and how they affect the inference grading. revision: yes

standing simulated objections not resolved
  • Lack of formal legal derivation, external expert validation, or cross-check against enforcement practice for the mapping in §4

Circularity Check

0 steps flagged

No circularity: framework and mapping rest on external legal texts and statistical learning theory

full rationale

The paper defines inference-capability levels from statistical learning theory and performs an interpretive mapping to the AI Act and Commission Guidelines on the definition of an AI system. Both sources are external to the paper. The central claims—that the full workflow must be assessed and that human involvement matters—are illustrated by constructing two credit-scoring examples; these examples do not reduce the framework levels or the legal mapping to fitted parameters or self-referential definitions. No equations, self-citations, or ansatzes are shown to create a definitional loop. The derivation chain is therefore self-contained against the cited external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The framework rests on legal definitions from the AI Act and statistical learning theory without introducing fitted parameters or new entities; the central analysis depends on the assumption that the proposed levels map to the legal concept of inference.

axioms (1)
  • domain assumption The AI Act distinguishes AI systems by the capability to infer, as elaborated in the Commission Guidelines.
    This premise is invoked to motivate the framework and determine sufficient levels for regulatory coverage.

pith-pipeline@v0.9.1-grok · 5780 in / 1119 out tokens · 27513 ms · 2026-06-27T09:59:46.193336+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

47 extracted references · 25 canonical work pages

  1. [1]

    Estimating the

    Raymond Anderson. 2007.The Credit Scoring Toolkit: Theory and Practice for Retail Credit Risk Management and Decision Automation. Oxford University Press, Oxford, UK. doi:10.1093/oso/9780199226405.001.0001

  2. [2]

    2024.ACCIS Response to Consultation on European Commission’s Guidelines for an AI System Definition

    Association of Consumer Credit Information Suppliers. 2024.ACCIS Response to Consultation on European Commission’s Guidelines for an AI System Definition. Retrieved January 31, 2026 from https://accis.eu/accis-response-to-consultation- on-european-commissions-guidelines-for-an-ai-system-definition

  3. [3]

    Yoshua Bengio, Aaron Courville, and Pascal Vincent. 2013. Representation Learning: A Review and New Perspectives.IEEE Transactions on Pattern Analysis and Machine Intelligence35, 8 (Aug. 2013), 1798–1828. doi:10.1109/TPAMI.2013.50

  4. [4]

    Boser, Isabelle M

    Bernhard E. Boser, Isabelle M. Guyon, and Vladimir N. Vapnik. 1992. A Training Algorithm for Optimal Margin Classifiers. InProceedings of the Fifth Annual Workshop on Computational Learning Theory(Pittsburgh, Pennsylvania, USA) (COLT ’92). Association for Computing Machinery, New York, NY, USA, 144–152. doi:10.1145/130385.130401

  5. [5]

    Leo Breiman. 2001. Statistical Modeling: The Two Cultures (with Comments and a Rejoinder by the Author).Statist. Sci.16, 3 (2001), 199–231. doi:10.1214/ss/ 1009213726

  6. [6]

    Friedman, Richard A

    Leo Breiman, Jerome H. Friedman, Richard A. Olshen, and Charles J. Stone. 1984. Classification and Regression Trees. Wadsworth, Belmont, CA, USA

  7. [7]

    Michael Bücker, Gero Szepannek, Alicja Gosiewska, and Przemyslaw Biecek

  8. [8]

    Journal of the Operational Research Society , author =

    Transparency, Auditability, and Explainability of Machine Learning Models in Credit Scoring.Journal of the Operational Research Society73, 1 (Jan. 2022), 70–90. doi:10.1080/01605682.2021.1922098

  9. [9]

    Heath Gauss, David K

    Zoran Bursac, C. Heath Gauss, David K. Williams, and David W. Hosmer. 2008. Purposeful selection of variables in logistic regression.Source code for biology and medicine3 (2008), 17

  10. [10]

    Carlotta Buttaboni and Luciano Floridi. 2026. A Regulatory Taxonomy of AI Opacity in the EU: Rethinking Transparency, Traceability, Interpretability, and Explainability.AI and Ethics6, Article 100 (2026). doi:10.1007/s43681-025-00940-0

  11. [11]

    George Casella and Roger L. Berger. 2002.Statistical Inference(2nd ed.). Duxbury, Pacific Grove, CA, USA

  12. [12]

    Carlos T. Castán. 2024. The Legal Concept of Artificial Intelligence: The Debate Surrounding the Definition of AI System in the AI Act.BioLaw Journal - Rivista di BioDiritto(2024), 305–344. doi:10.15168/2284-4503-3000

  13. [13]

    Council of the European Union. 2024. Proposal for a Regulation of the European Parliament and of the Council Laying Down Harmonised Rules on Artificial Intelligence (Artificial Intelligence Act) and Amending Certain Union Legislative Acts. https://data.consilium.europa.eu/doc/document/ST-14954-2022-INIT/en/ pdf ST 14954/22

  14. [14]

    Cover and Peter E

    Thomas M. Cover and Peter E. Hart. 1967. Nearest Neighbor Pattern Classifica- tion.IEEE Transactions on Information Theory13, 1 (1967), 21–27. doi:10.1109/ TIT.1967.1053964

  15. [15]

    Jesse Davis and Mark Goadrich. 2006. The Relationship Between Precision-Recall and ROC Curves. InProceedings of the 23rd International Conference on Machine Learning(Pittsburgh, Pennsylvania, USA)(ICML ’06). Association for Computing Machinery, New York, NY, USA, 233–240. doi:10.1145/1143844.1143874

  16. [16]

    Martin Ebers, Veronica R. S. Hoch, Frank Rosenkranz, Hannah Ruschemeier, and Björn Steinrötter. 2021. The European Commission’s Proposal for an Artificial Intelligence Act—A Critical Assessment by Members of the Robotics and AI Law Society (RAILS).J4, 4 (2021), 589–603. doi:10.3390/j4040043

  17. [17]

    Joshua Ellul. 2022. Should we regulate Artificial Intelligence or some uses of software?Discover Artificial Intelligence2 (2022). doi:10.1007/s44163-022-00021-9

  18. [18]

    European Commission. 2021. Proposal for a Regulation of the European Parliament and of the Council Laying Down Harmonised Rules on Artificial Intelligence (Artificial Intelligence Act) and Amending Certain Union Leg- islative Acts. https://eur-lex.europa.eu/legal-content/EN/TXT/?uri=CELEX: 52021PC0206 COM(2021) 206 final, CELEX:52021PC0206

  19. [19]

    2024.Commission Launches Consultation on AI Act Prohibitions and AI System Definition

    European Commission. 2024.Commission Launches Consultation on AI Act Prohibitions and AI System Definition. Retrieved January 31, 2026 from https://digital-strategy.ec.europa.eu/en/news/commission-launches- consultation-ai-act-prohibitions-and-ai-system-definition

  20. [20]

    European Commission. 2025. Guidelines on the Definition of an Ar- tificial Intelligence System Established by Regulation (EU) 2024/1689 (AI Act). https://digital-strategy.ec.europa.eu/en/library/commission-publishes- 9 Poretschkin and Naeven guidelines-ai-system-definition-facilitate-first-ai-acts-rules-application Regula- tion (EU) 2024/1689

  21. [21]

    David Fernández-Llorca, Emilia Gómez, Ignacio Sánchez, and Gabriele Mazz- ini. 2025. An Interdisciplinary Account of the Terminological Choices by EU Policymakers Ahead of the Final Agreement on the AI Act: AI System, General Purpose AI System, Foundation Model, and Generative AI.Artificial Intelligence and Law33, 4 (Dec. 2025), 875–888. doi:10.1007/s1050...

  22. [22]

    2012.Credit Scoring, Response Modeling, and Insurance Rating: A Practical Guide to Forecasting Consumer Behavior(2nd ed.)

    Steven Finlay. 2012.Credit Scoring, Response Modeling, and Insurance Rating: A Practical Guide to Forecasting Consumer Behavior(2nd ed.). Palgrave Macmillan, London, UK. doi:10.1057/9781137031693

  23. [23]

    Giusella Finocchiaro. 2024. The regulation of artificial intelligence.AI & SOCIETY 39 (2024), 1961–1968. doi:10.1007/s00146-023-01650-z

  24. [24]

    Luciano Floridi. 2023. On the Brussels-Washington Consensus About the Legal Definition of Artificial Intelligence.Philosophy & Technology36, 4, Article 87 (Dec. 2023). doi:10.1007/s13347-023-00690-z

  25. [25]

    Credit Fusion and Will Cukierski. 2011. Give Me Some Credit. https://kaggle. com/competitions/GiveMeSomeCredit. Kaggle

  26. [26]

    Goodfellow, Dumitru Erhan, Pierre L

    Ian J. Goodfellow, Dumitru Erhan, Pierre L. Carrier, Aaron Courville, Mehdi Mirza, Ben Hamner, Will Cukierski, Yichuan Tang, David Thaler, Dong-Hyun Lee, Yingbo Zhou, Chetan Ramaiah, Fangxiang Feng, Ruifan Li, Xiaojie Wang, Dimitris Athanasakis, John Shawe-Taylor, Maxim Milakov, John Park, Radu Ionescu, Marius Popescu, Cristian Grozea, James Bergstra, Jin...

  27. [27]

    Isabelle Guyon and André Elisseeff. 2003. An Introduction to Variable and Feature Selection.Journal of Machine Learning Research3 (March 2003), 1157–1182

  28. [28]

    2024.Comments on the Final Trilogue Version of the AI Act

    Philipp Hacker. 2024.Comments on the Final Trilogue Version of the AI Act. Retrieved January 31, 2026 from https://www.europeannewschool.eu/images/ chairs/hacker/Comments%20on%20the%20AI%20Act.pdf

  29. [29]

    Philipp Hacker and Maximilian Eber. 2025. The Future of Credit Underwriting and Insurance Under the EU AI Act: Implications for Europe and Beyond.Harvard Data Science Review7, 3 (Aug. 2025). https://hdsr.mitpress.mit.edu/pub/19cwd6qx

  30. [30]

    Hand and William E

    David J. Hand and William E. Henley. 1997. Statistical Classification Methods in Consumer Credit Scoring: A Review.Journal of the Royal Statistical Society Series A: Statistics in Society160, 3 (Sept. 1997), 523–541. doi:10.1111/j.1467- 985X.1997.00078.x

  31. [31]

    2nd ed., Springer, New York

    Trevor Hastie, Robert Tibshirani, and Jerome Friedman. 2009.The Elements of Statistical Learning: Data Mining, Inference, and Prediction(2nd ed.). Springer, New York, NY, USA. doi:10.1007/978-0-387-84858-7

  32. [32]

    Hosmer and Stanley Lemeshow

    David W. Hosmer and Stanley Lemeshow. 2000.Applied logistic regression. John Wiley and Sons

  33. [33]

    Stefan Lessmann, Bart Baesens, Hsin-Vonn Seow, and Lyn C. Thomas. 2015. Benchmarking State-of-the-Art Classification Algorithms for Credit Scoring: An Update of Research.European Journal of Operational Research247, 1 (Nov. 2015), 124–136. doi:10.1016/j.ejor.2015.05.030

  34. [34]

    Mitchell

    Tom M. Mitchell. 1997.Machine Learning. McGraw-Hill, New York, NY, USA

  35. [35]

    Montagnani, Marie-Claire Najjar, and Antonio Davola

    Maria L. Montagnani, Marie-Claire Najjar, and Antonio Davola. 2024. The EU Regulatory Approach(es) to AI Liability, and Its Application to the Financial Services Market.Computer Law & Security Review53, Article 105984 (2024). doi:10.1016/j.clsr.2024.105984

  36. [36]

    2024.Recommendation of the Council on Artificial Intelligence

    OECD. 2024.Recommendation of the Council on Artificial Intelligence. Technical Report. Paris, France. https://oecd.ai/en/assets/files/OECD-LEGAL-0449-en.pdf OECD/LEGAL/0449

  37. [37]

    Oxford English Dictionary. 2025. Inference. Retrieved January 31, 2026 from https://www.oed.com/dictionary/inference_n?tl=true

  38. [38]

    Georgios Pavlidis. 2024. Unlocking the Black Box: Analysing the EU Artificial Intelligence Act’s Framework for Explainability in AI.Law, Innovation and Technology16, 1 (2024), 293–308. doi:10.1080/17579961.2024.2313795

  39. [39]

    Presno Linera and Anne Meuwese

    Miguel Á. Presno Linera and Anne Meuwese. 2025. Regulating AI from Europe: A Joint Analysis of the AI Act and the Framework Convention on AI.The Theory and Practice of Legislation13, 3 (2025), 292–311. doi:10.1080/20508840.2025.2492524

  40. [40]

    John R. Quinlan. 1986. Induction of Decision Trees.Machine Learning1, 1 (March 1986), 81–106. doi:10.1007/BF00116251

  41. [41]

    2021.Artificial Intelligence, Global Edition A Modern Approach

    Stuart Russell and Peter Norvig. 2021.Artificial Intelligence, Global Edition A Modern Approach. Pearson Deutschland. 1168 pages

  42. [42]

    Jonas Schuett. 2023. Defining the Scope of AI Regulations.Law, Innovation and Technology15, 1 (2023), 60–82. doi:10.1080/17579961.2023.2184135

  43. [43]

    2006.Credit Risk Scorecards: Developing and Implementing Intelli- gent Credit Scoring

    Naeem Siddiqi. 2006.Credit Risk Scorecards: Developing and Implementing Intelli- gent Credit Scoring. John Wiley & Sons, Inc., Hoboken, NJ, USA

  44. [44]

    Sukhanjeet Singh, Andreas Schupbach, Antti Asiala, and Daniel A. Siwecki. 2025. AI’s Impact on Banking: Use Cases for Credit Scoring and Fraud Detection. Re- trieved January 31, 2026 from https://www.bankingsupervision.europa.eu/press/ supervisory-newsletters/newsletter/2025/html/ssm.nl251120_1.en.html ECB Supervision Newsletter

  45. [45]

    Gerald Spindler. 2023. Algorithms, Credit Scoring, and the New Proposals of the EU for an AI Act and on a Consumer Credit Directive.Law and Financial Markets Review15, 3-4 (2023), 239–261. doi:10.1080/17521440.2023.2168940

  46. [46]

    Vapnik and Alexey Y

    Vladimir N. Vapnik and Alexey Y. Chervonenkis. 1971. On the Uniform Con- vergence of Relative Frequencies of Events to Their Probabilities.Theory of Probability and its ApplicationsXVI, 2 (1971), 264–280

  47. [47]

    Michael Veale and Frederik Zuiderveen Borgesius. 2021. Demystifying the Draft EU Artificial Intelligence Act – Analysing the Good, the Bad, and the Unclear Elements of the Proposed Approach.Computer Law Review International22, 4 (2021), 97–112. doi:10.9785/cri-2021-220402 A Definitions for scorecard development The credit scorecard development workflow as...