Recognition: unknown
Model quality in football: Quantifying the quality of an Expected Threat model
Pith reviewed 2026-05-09 22:07 UTC · model grok-4.3
The pith
The Expected Threat model error is approximately log-normally distributed for a given number of training points and game states, providing thresholds for when player evaluations become unreliable in scouting.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Using the Markov chain that underlies the Expected Threat model, theoretical analyses and simulations demonstrate that model error is approximately log-normally distributed once the number of training points and game states are fixed. These simulations, paired with expert consultation, establish the error magnitude beyond which player evaluations derived from the model become unreliable for scouting applications, from which the authors derive rules of thumb for ensuring model quality prior to deployment.
What carries the argument
The Markov chain representation of football game states, which enables both theoretical derivation and simulation of the model's error distribution.
If this is right
- Model error follows an approximately log-normal distribution once training points and game states are fixed.
- There exists an identifiable error threshold past which Expected Threat-based player evaluations are unreliable for scouting.
- Rules of thumb can be applied to check model quality before practical use.
- The same quantification framework extends directly to Expected Possession Value models.
- A validated model can be used to generate reliable player evaluations in scouting workflows.
Where Pith is reading between the lines
- Teams could adjust the number of game states or data points to stay below the unreliable threshold while keeping computational cost low.
- The log-normal error shape may allow simple statistical tests to certify a new model before it is put into production.
- Similar simulation-plus-expert protocols could be applied to other unobservable-ground-truth models in sports analytics.
- Long-term monitoring of actual match outcomes against model predictions would provide an ongoing check on whether the error threshold remains stable.
Load-bearing premise
The Markov chain accurately captures the real dynamics of football play and expert judgment supplies a valid threshold for when model error makes evaluations unreliable.
What would settle it
A large out-of-sample validation set in which the observed distribution of Expected Threat model errors deviates substantially from log-normality or in which player ratings remain stable and useful well beyond the expert-derived error threshold.
read the original abstract
The recent growth in data availability in football has increased the risk of incorrect use of data-driven models, making guidelines on their validation and application necessary. The Expected Threat (xT) model is an accessible option for football organizations that start building in-house methods, yet little is known about how to assess its quality. The aim of this study is twofold: to examine how the model error depends on the number of game states and the number of training points, and to translate these results into guidelines for constructing and applying the model. Using the Markov chain underlying the model, we perform theoretical analyses and simulations to study the model error. These show that the model error is approximately log-normally distributed for a specified number of training points and game states. Additionally, we combine the simulations with expert consultation to establish the model error beyond which player evaluations based on the Expected Threat model become unreliable for scouting applications. From this, we derive rules of thumb to ensure the quality of an Expected Threat model before application, and we illustrate through an example how a validated model can be applied in practice. Because the approach generalizes to Expected Possession Value models, this paper illustrates a framework to systematically quantify model quality, despite the ground truth being unobservable in football analytics.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript examines the quality of Expected Threat (xT) models in football by leveraging the underlying Markov chain formulation. Through theoretical analyses and Monte Carlo simulations, it establishes that estimation error in xT values is approximately log-normally distributed for fixed numbers of training points and game states. The authors combine simulation results with expert consultation to identify a model-error threshold beyond which xT-based player evaluations become unreliable for scouting, from which they derive practical rules of thumb for model construction and illustrate an application example. The framework is presented as generalizable to Expected Possession Value models, providing a systematic approach to model validation where ground truth is unobservable.
Significance. If the internal error characterization and expert-calibrated threshold hold under the stated assumptions, the work supplies a concrete, simulation-driven framework for quantifying xT model quality that could help organizations avoid over-reliance on under-specified models in scouting and tactical analysis. The explicit use of the Markov chain for both theoretical derivations and controlled simulations is a methodological strength that allows precise statements about finite-sample behavior.
major comments (2)
- [simulation methodology and results] The Monte Carlo simulation design (described in the methods section on simulations and results) samples transition counts directly from the fitted Markov chain, thereby quantifying only sampling variance under correct specification. This construction supports the log-normal error claim within the model but does not inject continuous pitch locations, player-specific effects, or non-Markovian history that characterize real match data. Because the derived reliability threshold and rules of thumb are intended for scouting applications on actual data, the absence of misspecification analysis is load-bearing for the central claim that the guidelines ensure model quality in practice.
- [expert consultation and threshold derivation] The expert consultation used to set the numerical threshold for unreliable player evaluations (abstract and the section combining simulations with expert input) is presented without details on the number or expertise of participants, the precise elicitation protocol, or sensitivity of the threshold to alternative values. This threshold directly determines the rules of thumb, so lack of transparency and robustness checks weakens the translation from simulation results to actionable guidelines.
minor comments (2)
- [abstract and conclusion] The abstract states that the approach 'generalizes to Expected Possession Value models' but provides no explicit demonstration or discussion of the required modifications; a short paragraph or appendix illustrating the extension would strengthen the claim.
- [notation and methods] Notation for the number of game states and training points is introduced without a consolidated table of symbols; adding such a table would improve readability when the log-normal parameters are later referenced.
Simulated Author's Rebuttal
We thank the referee for their constructive comments on our manuscript. Below we provide point-by-point responses to the major comments. We agree with the need for greater transparency and will revise the manuscript accordingly to strengthen the presentation of our methods and results.
read point-by-point responses
-
Referee: [simulation methodology and results] The Monte Carlo simulation design (described in the methods section on simulations and results) samples transition counts directly from the fitted Markov chain, thereby quantifying only sampling variance under correct specification. This construction supports the log-normal error claim within the model but does not inject continuous pitch locations, player-specific effects, or non-Markovian history that characterize real match data. Because the derived reliability threshold and rules of thumb are intended for scouting applications on actual data, the absence of misspecification analysis is load-bearing for the central claim that the guidelines ensure model quality in practice.
Authors: Our simulations are specifically constructed to analyze the finite-sample behavior of the xT estimator under the Markov chain model assumptions, which enables the theoretical derivation of the approximate log-normal distribution of the error. This approach isolates the effect of the number of training points and game states on estimation error, providing a controlled environment to establish baseline reliability. We recognize that real-world football data may include additional complexities such as continuous spatial effects and non-Markovian dependencies not captured in the discrete state model. The rules of thumb are therefore presented as necessary but not sufficient conditions for model quality in practice, and we suggest they be used in conjunction with out-of-sample validation on real data. In the revised version, we will include an expanded discussion of these limitations and potential extensions to account for misspecification. revision: yes
-
Referee: [expert consultation and threshold derivation] The expert consultation used to set the numerical threshold for unreliable player evaluations (abstract and the section combining simulations with expert input) is presented without details on the number or expertise of participants, the precise elicitation protocol, or sensitivity of the threshold to alternative values. This threshold directly determines the rules of thumb, so lack of transparency and robustness checks weakens the translation from simulation results to actionable guidelines.
Authors: We acknowledge the importance of providing full details on the expert consultation to allow for proper evaluation of the threshold's robustness. The revised manuscript will include a more detailed description of the consultation process, specifying the number of experts involved, their relevant expertise in football analytics and scouting, the structured elicitation protocol employed, and results from sensitivity analyses varying the threshold value to assess impact on the derived rules of thumb. revision: yes
Circularity Check
No significant circularity: estimator error analysis is self-contained under the assumed model.
full rationale
The paper's central results derive from theoretical analysis and Monte Carlo simulation of the sampling distribution of the xT estimator when data are generated from the fitted Markov chain itself. This is a standard statistical procedure for characterizing finite-sample properties and does not reduce the reported log-normal error distribution or the expert-calibrated reliability threshold to any definitional equivalence, fitted-input renaming, or self-citation chain. The Markov chain representation and expert consultation are treated as external inputs rather than outputs of the same fitting step. No load-bearing self-citation, ansatz smuggling, or uniqueness theorem imported from prior author work is required for the claims.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Football can be represented as a Markov chain whose states capture the relevant game situations for threat calculation.
Reference graph
Works this paper leans on
-
[1]
Science and Medicine in Football10(1), 80–95 (2025) https://doi.org/10.1080/24733938 .2025.2476478
Dello Iacono, A., Datson, N., Clubb, J., Lacome, M., Sullivan, A., Shushan, T.: Data analytics practices and reporting strategies in senior football: insights into athlete health and performance from over 200 practitioners worldwide. Science and Medicine in Football10(1), 80–95 (2025) https://doi.org/10.1080/24733938 .2025.2476478
-
[2]
In: Raval, M.S., Kaya, T., Artan, N.S., Taber, C
Kholkine, L.: Opportunities and challenges of machine learning in sports. In: Raval, M.S., Kaya, T., Artan, N.S., Taber, C. (eds.) Sports Data Analytics: Tech- niques, Applications, and Innovations, pp. 243–260. Springer, Singapore (2026). https://doi.org/10.1007/978-981-95-5132-3 13
-
[3]
Frontiers in Sports and Active Living7(2025) https://doi.or g/10.3389/fspor.2025.1569155
Teixeira, J.E., Maio, E., Afonso, P., Encarna¸ c˜ ao, S., Machado, G.F., Morgans, R., Barbosa, T.M., Monteiro, A.M., Forte, P., Ferraz, R., Branquinho, L.: Mapping football tactical behavior and collective dynamics with artificial intelligence: a systematic review. Frontiers in Sports and Active Living7(2025) https://doi.or g/10.3389/fspor.2025.1569155
-
[4]
Biology of Sport40(1), 249– 263 (2023) https://doi.org/10.5114/biolsport.2023.112970 24
Rico-Gonz´ alez, M., Pino-Ortega, J., M´ endez, A., Clemente, F., Baca, A.: Machine learning application in soccer: a systematic review. Biology of Sport40(1), 249– 263 (2023) https://doi.org/10.5114/biolsport.2023.112970 24
-
[5]
Science and Medicine in Football, 1–13 (2025) https://doi.org/10.1080/24733938.2025.2533784
Olthof, S., Davis, J.: Perspectives on data analytics for gaining a competitive advantage in football: computational approaches to tactics. Science and Medicine in Football, 1–13 (2025) https://doi.org/10.1080/24733938.2025.2533784
-
[6]
In: Proceedings of the 7th Workshop on Machine Learning and Data Mining for Sports Analytics, pp
Robberechts, P., Davis, J.: How data availability affects the ability to learn good xG models. In: Proceedings of the 7th Workshop on Machine Learning and Data Mining for Sports Analytics, pp. 17–27. Springer International Publishing, Cham (2020). https://doi.org/10.1007/978-3-030-64912-8 2
-
[7]
Frontiers in Sports and Active Living3 - 2021(2021) https://doi.org/10.3389/fspor.2021.624475
Anzer, G., Bauer, P.: A goal scoring probability model for shots based on syn- chronized positional and event data in football (soccer). Frontiers in Sports and Active Living3 - 2021(2021) https://doi.org/10.3389/fspor.2021.624475
-
[8]
PLOS ONE18(4), 0282295 (2023) https: //doi.org/10.1371/journal.pone.0282295
Mead, J., O’Hare, A., McMenemy, P.: Expected goals in football: Improving model performance and demonstrating value. PLOS ONE18(4), 0282295 (2023) https: //doi.org/10.1371/journal.pone.0282295
-
[9]
Presented at the 2011 New Eng- land Symposium on Statistics in Sports, Harvard University, Cambridge, MA, 24 September 2011 (2011)
Rudd, S.: A Framework for Tactical Analysis and Individual Offensive Production Assessment in Soccer Using Markov Chains. Presented at the 2011 New Eng- land Symposium on Statistics in Sports, Harvard University, Cambridge, MA, 24 September 2011 (2011)
2011
-
[10]
In: Proceedings of the AAAI 2020 Workshop on Artificial Intelligence in Team Sports
Van Roy, M., Robberechts, P., Decroos, T., Davis, J.: Valuing on-the-ball actions in soccer: A critical comparison of xt and vaep. In: Proceedings of the AAAI 2020 Workshop on Artificial Intelligence in Team Sports. AAAI Press, New York, USA (2020). https://tomdecroos.github.io/reports/xt vs vaep.pdf
2020
-
[11]
https://karun.in/blog/expected-t hreat.html
Singh, K.: Introducing Expected Threat (xT). https://karun.in/blog/expected-t hreat.html. Accessed: 5-11-2024 (2018)
2024
-
[12]
SpringerPlus5(1) (2016) https: //doi.org/10.1186/s40064-016-3108-2
Rein, R., Memmert, D.: Big data and tactical analysis in elite soccer: future challenges and opportunities for sports science. SpringerPlus5(1) (2016) https: //doi.org/10.1186/s40064-016-3108-2
-
[13]
Herold, M., Goes, F., Nopp, S., Bauer, P., Thompson, C., Meyer, T.: Machine learning in men’s professional football: Current applications and future directions for improving attacking play. International Journal of Sports Science & Coaching 14(6), 798–817 (2019) https://doi.org/10.1177/1747954119879350
-
[14]
PhD thesis, KU Leuven (2025)
Bransen, L.: Beyond the scoreline: Using machine learning techniques to under- stand (women’s) soccer. PhD thesis, KU Leuven (2025)
2025
-
[15]
Paper presented at the 15th International Conference on the Engineering of Sport (ISEA 2024), Lough- borough University, Loughborough, 8–11 July 2024 (2024)
Van Arem, K.W., Bruinsma, M.: Extended xThreat: an explainable quality assess- ment method for actions in football using game context. Paper presented at the 15th International Conference on the Engineering of Sport (ISEA 2024), Lough- borough University, Loughborough, 8–11 July 2024 (2024). https://doi.org/10.1 7028/RD.LBORO.27045427.V1 25
2024
-
[16]
Applied Sciences 15(8) (2025) https://doi.org/10.3390/app15084151
Hassani, K., Ramdani, M., Lotfi, M.: Dynamic expected threat (dxt) model: Addressing the deficit of realism in football action evaluation. Applied Sciences 15(8) (2025) https://doi.org/10.3390/app15084151
-
[17]
Paper presented at the 13th MIT Sloan Sports Analytics Conference, Boston, MA, 1-2 March 2019 (2019)
Fern´ andez, J., Bornn, L., Cervone, D.: Decomposing the Immeasurable Sport: A Deep Learning Expected Possession Value Framework for Soccer. Paper presented at the 13th MIT Sloan Sports Analytics Conference, Boston, MA, 1-2 March 2019 (2019). https://www.sloansportsconference.com/research-papers/decomposing-t he-immeasurable-sport-a-deep-learning-expected...
2019
-
[18]
Machine Learning 110(6), 1389–1427 (2021) https://doi.org/10.1007/s10994-021-05989-6
Fern´ andez, J., Bornn, L., Cervone, D.: A framework for the fine-grained evalua- tion of the instantaneous expected value of soccer possessions. Machine Learning 110(6), 1389–1427 (2021) https://doi.org/10.1007/s10994-021-05989-6
-
[19]
Paper presented at the 15th MIT Sloan Sports Analytics Conference, Boston, MA, 8-9 April 2021 (2021)
St¨ ockl, M., Seidl, T., Marley, D., Power, P.: Making Offensive Play Predictable: Using a Graph Convolutional Network to Understand Defensive Performance in Soccer. Paper presented at the 15th MIT Sloan Sports Analytics Conference, Boston, MA, 8-9 April 2021 (2021). https://www.sloansportsconference.com/re search-papers/making-offensive-play-predictable-...
2021
-
[20]
Overmeer, T., Janssen, T., Nuijten, W.: Revisiting Expected Possession Value in Football: Introducing a U-Net architecture, reward and risk for passes, and a benchmark. Paper presented at the 13th International Conference on Sport Sciences Research and Technology Support (icSPORTS), Marbella, Spain, 21-22 October 2025 (2025). https://doi.org/10.5220/00137...
-
[21]
Jour- nal of Artificial Intelligence Research77, 517–562 (2023) https://doi.org/10.161 3/jair.1.13934
Van Roy, M., Robberechts, P., Yang, W.-C., De Raedt, L., Davis, J.: A markov framework for learning and reasoning about strategies in professional soccer. Jour- nal of Artificial Intelligence Research77, 517–562 (2023) https://doi.org/10.161 3/jair.1.13934
2023
-
[22]
The CMS experiment at the CERN LHC
Pulis, M., Bajada, J.: Reinforcement Learning for Football Player Decision Making Analysis. Paper presented at the Statsbomb Conference, London, 20 September 2022 (2022). https://www.um.edu.mt/library/oar/handle/123456789 /131785
-
[23]
Rahimian, P., Van Haaren, J., Toka, L.: Towards maximizing expected pos- session outcome in soccer. International Journal of Sports Science & Coaching 19(1), 230–244 (2024) https://doi.org/10.1177/17479541231154494 https://doi.org/10.1177/17479541231154494
-
[24]
Paper presented at the MIT Sloan Sports Analytics Conference, 6–7 March 2026 (2026)
Kim, H., Seo, S., Choi, H., Boomstra, T., Yoon, J., Park, C.: Better Prevent than Tackle: valuing defense in soccer based on graph neural networks. Paper presented at the MIT Sloan Sports Analytics Conference, 6–7 March 2026 (2026). https://www.sloansportsconference.com/research-papers/better-prevent-than-t 26 ackle-valuing-defense-in-soccer-based-on-grap...
2026
-
[25]
https://github.com/statsbomb/open-data
StatsBomb: Open Data. https://github.com/statsbomb/open-data. Accessed: 2024-10-15 (2024)
2024
-
[26]
Machine Learning113, 6977–7010 (2024) https://doi.org/10.1007/s10994-024-06585-0
Davis, J., Bransen, L., Devos, L.,et al.: Methodology and evaluation in sports analytics: Challenges, approaches, and lessons learned. Machine Learning113, 6977–7010 (2024) https://doi.org/10.1007/s10994-024-06585-0
-
[27]
https://www.tudelft.nl/dhpc/ark: /44463/DelftBluePhase2 (2024) 27
(DHPC): DelftBlue Supercomputer (Phase 2). https://www.tudelft.nl/dhpc/ark: /44463/DelftBluePhase2 (2024) 27
2024
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.