pith. sign in

arxiv: 2606.11120 · v1 · pith:I6CDPYO7new · submitted 2026-06-09 · 💻 cs.AI · cs.CV

Monte Carlo Pass Search: Using Trajectory Generation for 3D Counterfactual Pass Evaluation in Football

Pith reviewed 2026-06-27 13:15 UTC · model grok-4.3

classification 💻 cs.AI cs.CV
keywords football pass evaluationMonte Carlo searchcounterfactual analysistrajectory generationvalue models3D tracking dataexecution surplusautoregressive world model
0
0 comments X

The pith

Monte Carlo search over pass variants produces distribution-aware value attribution for football passes.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper recasts pass evaluation as a Monte Carlo search problem that infers kick parameters from each observed pass and then samples multiple execution variants and option variants. Each candidate is rolled forward by a ball-conditioned world model until the next ball interaction, after which a learned value model assigns a possession-value outcome; repeating this process yields a full distribution over possible value gained. From that distribution the method derives two execution-surplus scores, one based on the mean and one on a chosen percentile, that attribute credit or blame more finely than single-point estimates. A sympathetic reader would care because conventional pass metrics often ignore what else the passer could plausibly have done, whereas a distribution over counterfactual outcomes supplies a direct measure of surplus value created. The work shows that an autoregressive trajectory generator originally developed for driving can be adapted to produce the required multi-agent, ball-conditioned rollouts on the first public 3D Bundesliga tracking dataset.

Core claim

Monte Carlo Pass Search (MCPS) infers kick parameters for each observed pass, samples execution variants and option variants, rolls each candidate forward with a ball-conditioned world model until the next ball interaction, and scores outcomes with a learned value model to obtain a distribution over gained value; this distribution enables distribution-aware attribution with two complementary execution-surplus scores (mean-based and percentile-based).

What carries the argument

Monte Carlo Pass Search (MCPS), which combines kick-parameter inference, variant sampling, autoregressive ball-conditioned trajectory rollouts, and value-model scoring to generate outcome distributions.

If this is right

  • The method supplies both a mean-based and a percentile-based execution-surplus score for each pass.
  • Pass ranking and player analysis can now be performed on full distributions rather than point estimates.
  • The adapted SMART generator achieves strong best-of-20 forecasting accuracy on the 3D tracking data while supporting fully hypothetical rollouts.
  • Model checkpoints and code are released to enable further use of the trajectory generator for evaluation tasks.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same search structure could be applied to other discrete actions such as shots or carries by redefining the variant-sampling step around those actions.
  • If the world model continues to generalize, the approach could support live decision-support tools that surface high-surplus passing options in real time.
  • The public release of the adapted generator invites direct comparison against other multi-agent simulators on the same 3D football data.

Load-bearing premise

The adapted SMART autoregressive trajectory generator produces accurate multi-agent ball-conditioned rollouts in hypothetical counterfactual football scenarios.

What would settle it

Direct multi-step forecasting tests on held-out 3D Bundesliga sequences in which the adapted SMART model exhibits higher error than simpler baselines would show that the generated value distributions rest on unreliable rollouts.

Figures

Figures reproduced from arXiv: 2606.11120 by Andrew Kang, Priya Narasimhan.

Figure 1
Figure 1. Figure 1: From point estimates to distributions. Existing pass-evaluation workflows (right) primarily score the observed outcome, with limited counterfactual reasoning and little notion of execution sensitivity. We introduce Monte Carlo Pass Search (MCPS; left): for each observed pass we infer kick parameters, sample 256 counterfactual executions and alternatives, generate short-horizon futures with a trajectory-con… view at source ↗
Figure 2
Figure 2. Figure 2: TRACAB Gen5 camera setup [17] used to estimate player coordinates and 3D ball coordinates in the public tracking dataset [3] used in this paper. Monte Carlo Tree Search, model-based RL, and world models. Our planning perspective is inspired by policy learning methods, although we use it primarily for offline action evaluation. MuZero, which combines tree search with a learned dynamics model to attain a str… view at source ↗
Figure 3
Figure 3. Figure 3: Workflow for each simulated pass in our Monte Carlo [PITH_FULL_IMAGE:figures/full_fig_p004_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Example MCPS evaluation (local search). Left: observed ball trajectory and sampled local-variant trajectories (colored according to PV obtained, ∆PV). Top-right: distribution of counterfactual pass ∆PV under local perturbations, with the observed pass shown as a dashed red line. Bottom-right: ∆PV versus distance from the observed pass in inferred kick-parameter space, illustrating sensitivity of downstream… view at source ↗
Figure 5
Figure 5. Figure 5: Qualitative multi-agent trajectory forecasting. This is an example clip generated with the ball trajectory as condition for each timestep. Left: ground truth. Right: best-of-20 sampled rollouts under ADE/FDE. Blue/red denote teams; yellow denotes the ball (conditioning). Model Receiver Top-1 Acc. ↑ Pass Success Acc. ↑ Pass Success AUROC ↑ Spearman, reported [24] 0.679 0.805 ∼0.85 Anzer, reported [2] 0.899 … view at source ↗
Figure 6
Figure 6. Figure 6: Global search, mean-difference surplus. Bars show each player’s average observed-minus-counterfactual ∆PV over their passes [PITH_FULL_IMAGE:figures/full_fig_p008_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Local search, percentile surplus. Bars show each player’s average percentile rank of observed-pass ∆PV within the local counterfactual distribution. References [1] Eloi Alonso, Adam Jelley, Vincent Micheli, Anssi Kan￾ervisto, Amos J Storkey, Tim Pearce, and Franc¸ois Fleuret. Diffusion for world modeling: Visual details matter in atari. Advances in Neural Information Processing Systems, 37: 58757–58791, 20… view at source ↗
read the original abstract

We recast pass evaluation in football (soccer) as a Monte Carlo Tree Search (MCTS)-like evaluation problem whose components mostly exist in the literature under different names: a value model (possession value), a world model (multi-agent trajectories with ball interactions), and a policy over counterfactual actions (sampling pass variants with noise). Building on the first public high-fidelity tracking dataset with 3D ball trajectories from the Bundesliga, we introduce Monte Carlo Pass Search (MCPS), which infers kick parameters for each observed pass, samples execution variants and option variants, rolls each candidate forward with a ball-conditioned world model until the next ball interaction, and scores outcomes with a learned value model to obtain a distribution over gained value. This distribution enables distribution-aware attribution with two complementary execution-surplus scores used for analysis and ranking: mean-based and percentile-based scores. To make the world model sample-efficient under limited public data, we adapt a discrete-token, autoregressive trajectory generator from autonomous driving (SMART) and show it yields strong best-of-20 forecasting accuracy compared to baselines, while supporting fully hypothetical rollouts for downstream evaluation. We have released model checkpoints and code.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The paper proposes Monte Carlo Pass Search (MCPS) for 3D counterfactual pass evaluation in football. It infers kick parameters from observed passes, samples execution and option variants, rolls candidates forward via an adapted SMART autoregressive trajectory generator (ball-conditioned world model) until the next ball interaction, and scores outcomes with a learned value model to obtain a distribution over gained value. This distribution supports two execution-surplus attribution scores (mean-based and percentile-based). The adapted world model achieves strong best-of-20 forecasting accuracy versus baselines on the public Bundesliga 3D tracking data; code and checkpoints are released.

Significance. If the counterfactual rollouts remain faithful, MCPS supplies a distribution-aware alternative to point-estimate pass metrics by reusing established value and trajectory components in a new domain. The code release and use of public 3D data are concrete strengths that support reproducibility and further work.

major comments (2)
  1. [world model adaptation and evaluation] The world-model evaluation reports strong best-of-20 forecasting accuracy only on observed trajectories. Because the mean-based and percentile-based execution-surplus scores are computed from rollouts under deliberately altered kick parameters, the absence of any reported test on counterfactual or off-distribution inputs leaves open the possibility that domain-shift artifacts from the driving-data pretraining corrupt the value distributions. This is load-bearing for the central attribution claim.
  2. [MCPS pipeline] The sampling of execution variants is controlled by a free noise distribution whose parameters are not ablated. Because the two surplus scores are explicitly distribution-aware, sensitivity of the reported rankings or attributions to this choice should be quantified.
minor comments (1)
  1. [methods] The distinction between 'execution variants' and 'option variants' is introduced without a compact notation or diagram; a small schematic would improve clarity.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We appreciate the referee's detailed review and constructive comments on our manuscript. We address each major comment below, proposing revisions to strengthen the paper where appropriate.

read point-by-point responses
  1. Referee: [world model adaptation and evaluation] The world-model evaluation reports strong best-of-20 forecasting accuracy only on observed trajectories. Because the mean-based and percentile-based execution-surplus scores are computed from rollouts under deliberately altered kick parameters, the absence of any reported test on counterfactual or off-distribution inputs leaves open the possibility that domain-shift artifacts from the driving-data pretraining corrupt the value distributions. This is load-bearing for the central attribution claim.

    Authors: We agree that this is an important point and that direct evaluation on counterfactual inputs would provide stronger evidence for the reliability of the value distributions. However, ground-truth counterfactual trajectories do not exist by definition, making quantitative evaluation challenging. In the revised manuscript, we will add a dedicated discussion section addressing potential domain shift from the driving pretraining and include new experiments that test the world model on trajectories with controlled perturbations to initial kick parameters. These will serve as a proxy for off-distribution performance. We will also make the code for these additional analyses publicly available. revision: yes

  2. Referee: [MCPS pipeline] The sampling of execution variants is controlled by a free noise distribution whose parameters are not ablated. Because the two surplus scores are explicitly distribution-aware, sensitivity of the reported rankings or attributions to this choice should be quantified.

    Authors: We thank the referee for highlighting this. We will conduct an ablation study on the noise distribution parameters (such as the standard deviation of the Gaussian noise added to execution variants) and quantify the impact on the mean-based and percentile-based execution-surplus scores as well as on the resulting player rankings. The results of this sensitivity analysis will be included in the revised version of the paper. revision: yes

Circularity Check

0 steps flagged

No circularity: MCPS derives evaluation scores from independent trained models and rollouts

full rationale

The paper trains a world model (adapted SMART) on trajectory data via supervised learning and a separate value model for possession value. It then performs Monte Carlo rollouts of sampled pass variants through these models to produce distributions over gained value for attribution. This simulation-based process is not equivalent by construction to the input pass labels or fitted parameters; the downstream mean-based and percentile-based execution-surplus scores are generated outputs rather than tautological renamings or direct fits. No self-citation chains, ansatzes smuggled via citation, or uniqueness theorems from the same authors are load-bearing. The derivation remains self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

The method depends on two learned models whose parameters are fitted to data and on the domain assumption that an autoregressive trajectory model trained on driving data can be repurposed for football without introducing large systematic errors in counterfactual rollouts.

free parameters (1)
  • noise distribution for sampling pass execution variants
    Controls how widely the system explores alternative kick parameters; chosen to reflect realistic execution variability.
axioms (1)
  • domain assumption An autoregressive discrete-token trajectory model can faithfully simulate multi-agent ball interactions in hypothetical football scenarios
    Invoked when the paper states that the adapted SMART model supports fully hypothetical rollouts for downstream evaluation.

pith-pipeline@v0.9.1-grok · 5738 in / 1573 out tokens · 28017 ms · 2026-06-27T13:15:36.777641+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

34 extracted references · 3 linked inside Pith

  1. [1]

    Diffusion for world modeling: Visual details matter in atari

    Eloi Alonso, Adam Jelley, Vincent Micheli, Anssi Kan- ervisto, Amos J Storkey, Tim Pearce, and Franc ¸ois Fleuret. Diffusion for world modeling: Visual details matter in atari. Advances in Neural Information Processing Systems, 37: 58757–58791, 2024. 3

  2. [2]

    Expected passes: Deter- mining the difficulty of a pass in football (soccer) using spatio-temporal data.Data mining and knowledge discov- ery, 36(1):295–317, 2022

    Gabriel Anzer and Pascal Bauer. Expected passes: Deter- mining the difficulty of a pass in football (soccer) using spatio-temporal data.Data mining and knowledge discov- ery, 36(1):295–317, 2022. 1, 2, 6, 7

  3. [3]

    An integrated dataset of spatiotemporal and event data in elite soccer.Scientific Data, 12(1):195, 2025

    Manuel Bassek, Robert Rein, Hendrik Weber, and Daniel Memmert. An integrated dataset of spatiotemporal and event data in elite soccer.Scientific Data, 12(1):195, 2025. 1, 3, 4, 5

  4. [4]

    Bransen, J

    P. Bransen, J. Van Haaren, and J. Davis. Valuing on-the-ball actions in soccer: A critical comparison of expected threat and vaep.Journal of Sports Analytics, 6(1):1–10, 2020. 1, 3

  5. [5]

    nuscenes: A multi- modal dataset for autonomous driving

    Holger Caesar, Varun Bankiti, Alex H Lang, Sourabh V ora, Venice Erin Liong, Qiang Xu, Anush Krishnan, Yu Pan, Gi- ancarlo Baldan, and Oscar Beijbom. nuscenes: A multi- modal dataset for autonomous driving. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 11621–11631, 2020. 3

  6. [6]

    Transportmer: A holistic approach to trajectory understanding in multi-agent sports

    Guillem Capellera, Luis Ferraz, Antonio Rubio, Antonio Agudo, and Francesc Moreno-Noguer. Transportmer: A holistic approach to trajectory understanding in multi-agent sports. InProceedings of the asian conference on computer vision, pages 1652–1670, 2024. 3

  7. [7]

    Argoverse: 3d tracking and forecasting with rich maps

    Ming-Fang Chang, John Lambert, Patsorn Sangkloy, Jag- jeet Singh, Slawomir Bak, Andrew Hartnett, De Wang, Peter Carr, Simon Lucey, Deva Ramanan, et al. Argoverse: 3d tracking and forecasting with rich maps. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 8748–8757, 2019. 3

  8. [8]

    Decroos, P

    L. Decroos, P. Bransen, J. Van Haaren, and J. Davis. Ac- tions speak louder than goals: Valuing player actions in soc- cer. InProceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pages 1851–1861, 2019. 3

  9. [9]

    A framework for the fine-grained evaluation of the instanta- neous expected value of soccer possessions.Machine Learn- ing, 110(6):1389–1427, 2021

    Javier Fern ´andez, Luke Bornn, and Daniel Cervone. A framework for the fine-grained evaluation of the instanta- neous expected value of soccer possessions.Machine Learn- ing, 110(6):1389–1427, 2021. 3

  10. [10]

    Learning visual predictive models of physics for playing billiards.arXiv preprint arXiv:1511.07404,

    Katerina Fragkiadaki, Pulkit Agrawal, Sergey Levine, and Jitendra Malik. Learning visual predictive models of physics for playing billiards.arXiv preprint arXiv:1511.07404,

  11. [11]

    Adap- tive action supervision in reinforcement learning from real-world multi-agent demonstrations.arXiv preprint arXiv:2305.13030, 2023

    Keisuke Fujii, Kazushi Tsutsui, Atom Scott, Hiroshi Naka- hara, Naoya Takeishi, and Yoshinobu Kawahara. Adap- tive action supervision in reinforcement learning from real-world multi-agent demonstrations.arXiv preprint arXiv:2305.13030, 2023. 3

  12. [12]

    Not every pass can be an assist: a data-driven model to measure pass effectiveness in profes- sional soccer matches.Big data, 7(1):57–70, 2019

    Floris R Goes, Matthias Kempe, Laurentius A Meerhoff, and Koen APM Lemmink. Not every pass can be an assist: a data-driven model to measure pass effectiveness in profes- sional soccer matches.Big data, 7(1):57–70, 2019. 2

  13. [13]

    Mastering diverse domains through world models

    Danijar Hafner, Jurgis Pasukonis, Jimmy Ba, and Timothy Lillicrap. Mastering diverse domains through world models. arXiv preprint arXiv:2301.04104, 2023. 3

  14. [14]

    A game strategy model in the digital curling system based on nfsp.Complex & Intelligent Systems, 8(3):1857–1863, 2022

    Yuntao Han, Qibin Zhou, and Fuqing Duan. A game strategy model in the digital curling system based on nfsp.Complex & Intelligent Systems, 8(3):1857–1863, 2022. 3

  15. [15]

    Using physics simulations to find targeting strategies in competitive bowling.arXiv preprint arXiv:2210.06753,

    Simon Ji, Shouzhuo Yang, Wilber Dominguez, and Cacey Bester. Using physics simulations to find targeting strategies in competitive bowling.arXiv preprint arXiv:2210.06753,

  16. [16]

    Deep reinforcement learning in continuous action spaces: a case study in the game of simulated curling

    Kyowoon Lee, Sol-A Kim, Jaesik Choi, and Seong-Whan Lee. Deep reinforcement learning in continuous action spaces: a case study in the game of simulated curling. In International conference on machine learning, pages 2937–

  17. [17]

    Football- specific validity of tracab’s optical video tracking systems

    Daniel Linke, Daniel Link, and Martin Lames. Football- specific validity of tracab’s optical video tracking systems. PloS one, 15(3):e0230179, 2020. 3

  18. [18]

    Graphical model for basketball match simulation

    Min-hwan Oh, Suraj Keshri, and Garud Iyengar. Graphical model for basketball match simulation. InProceedings of the 2015 MIT Sloan Sports Analytics Conference, Boston, MA, USA, 2015. 3

  19. [19]

    Time- series imputation of temporally-occluded multiagent trajec- tories.arXiv preprint arXiv:2106.04219, 2021

    Shayegan Omidshafiei, Daniel Hennes, Marta Garnelo, Eu- gene Tarassov, Zhe Wang, Romuald Elie, Jerome T Connor, Paul Muller, Ian Graham, William Spearman, et al. Time- series imputation of temporally-occluded multiagent trajec- tories.arXiv preprint arXiv:2106.04219, 2021. 3

  20. [20]

    Inferring the strategy of offensive and defensive play in soccer with inverse rein- forcement learning

    Pegah Rahimian and Laszlo Toka. Inferring the strategy of offensive and defensive play in soccer with inverse rein- forcement learning. InInternational Workshop on Machine Learning and Data Mining for Sports Analytics, pages 26–

  21. [21]

    A framework for tactical analysis and individ- ual offensive production assessment in soccer using markov chains

    Sarah Rudd. A framework for tactical analysis and individ- ual offensive production assessment in soccer using markov chains. InNew England symposium on statistics in sports, pages 36–55, 2011. 1

  22. [22]

    Mastering atari, go, chess and shogi by planning with a learned model.Nature, 588(7839):604–609, 2020

    Julian Schrittwieser, Ioannis Antonoglou, Thomas Hubert, Karen Simonyan, Laurent Sifre, Simon Schmitt, Arthur Guez, Edward Lockhart, Demis Hassabis, Thore Graepel, et al. Mastering atari, go, chess and shogi by planning with a learned model.Nature, 588(7839):604–609, 2020. 3

  23. [23]

    Expected threat, 2019

    Karun Singh. Expected threat, 2019. 1

  24. [24]

    Physics-based modeling of pass probabilities in soccer

    William Spearman, Austin Basye, Greg Dick, Ryan Hotovy, and Paul Pop. Physics-based modeling of pass probabilities in soccer. InProceeding of the 11th MIT Sloan Sports Ana- lytics Conference. Boston, MA, 2017. 1, 2, 3, 6, 7

  25. [25]

    Scalability in perception for autonomous driving: Waymo open dataset

    Pei Sun, Henrik Kretzschmar, Xerxes Dotiwalla, Aurelien Chouard, Vijaysai Patnaik, Paul Tsui, James Guo, Yin Zhou, Yuning Chai, Benjamin Caine, et al. Scalability in perception for autonomous driving: Waymo open dataset. InProceed- ings of the IEEE/CVF conference on computer vision and pattern recognition, pages 2446–2454, 2020. 3

  26. [26]

    Evaluation of creating scoring opportunities for teammates in soccer via trajectory prediction

    Masakiyo Teranishi, Kazushi Tsutsui, Kazuya Takeda, and Keisuke Fujii. Evaluation of creating scoring opportunities for teammates in soccer via trajectory prediction. InInter- national workshop on machine learning and data mining for sports analytics, pages 53–73. Springer, 2022. 3

  27. [27]

    Tacticai: an ai assistant for football tactics.Nature commu- nications, 15(1):1906, 2024

    Zhe Wang, Petar Veli ˇckovi´c, Daniel Hennes, Nenad Tomaˇsev, Laurel Prince, Michael Kaisers, Yoram Bachrach, Romuald Elie, Li Kevin Wenliang, Federico Piccinini, et al. Tacticai: an ai assistant for football tactics.Nature commu- nications, 15(1):1906, 2024. 3, 4

  28. [28]

    Forecasting events using an aug- mented hidden conditional random field

    Xinyu Wei, Patrick Lucey, Stephen Vidas, Stuart Morgan, and Sridha Sridharan. Forecasting events using an aug- mented hidden conditional random field. InComputer Vision–ACCV 2014: 12th Asian Conference on Computer Vision, Singapore, Singapore, November 1-5, 2014, Revised Selected Papers, Part IV 12, pages 569–582. Springer, 2015. 3

  29. [29]

    Argoverse 2: Next generation datasets for self-driving perception and forecasting.arXiv preprint arXiv:2301.00493, 2023

    Benjamin Wilson, William Qi, Tanmay Agarwal, John Lambert, Jagjeet Singh, Siddhesh Khandelwal, Bowen Pan, Ratnesh Kumar, Andrew Hartnett, Jhony Kaesemodel Pontes, et al. Argoverse 2: Next generation datasets for self-driving perception and forecasting.arXiv preprint arXiv:2301.00493, 2023. 3

  30. [30]

    An adaptive deep reinforcement learning framework enables curling robots with human-like performance in real-world conditions.Science Robotics, 5(46):eabb9764, 2020

    Dong-Ok Won, Klaus-Robert M ¨uller, and Seong-Whan Lee. An adaptive deep reinforcement learning framework enables curling robots with human-like performance in real-world conditions.Science Robotics, 5(46):eabb9764, 2020. 3

  31. [31]

    Smart: Scalable multi-agent real-time motion generation via next- token prediction.Advances in Neural Information Process- ing Systems, 37:114048–114071, 2024

    Wei Wu, Xiaoxin Feng, Ziyan Gao, and Yuheng Kan. Smart: Scalable multi-agent real-time motion generation via next- token prediction.Advances in Neural Information Process- ing Systems, 37:114048–114071, 2024. 3, 4, 6

  32. [32]

    Policy decision of curling in real competition scenes.Complex & Intelligent Systems, 9(3):3301–3312, 2023

    Qian Xiao, Zongmin Li, Xiangdong Wang, Yujie Liu, Yachuan Li, Chaozhi Yang, and Feimo Li. Policy decision of curling in real competition scenes.Complex & Intelligent Systems, 9(3):3301–3312, 2023. 3

  33. [33]

    Sports-traj: A unified trajectory gen- eration model for multi-agent movement in sports.arXiv preprint arXiv:2405.17680, 2024

    Yi Xu and Yun Fu. Sports-traj: A unified trajectory gen- eration model for multi-agent movement in sports.arXiv preprint arXiv:2405.17680, 2024. 3, 6

  34. [34]

    Monte carlo tree search in continuous action spaces with execution uncertainty

    Timothy Yee, Viliam Lis`y, Michael H Bowling, and S Kamb- hampati. Monte carlo tree search in continuous action spaces with execution uncertainty. InIJCAI, pages 690–697, 2016. 3