pith. machine review for the scientific record. sign in

arxiv: 2605.10018 · v1 · submitted 2026-05-11 · 💻 cs.LG

The Value of Mechanistic Priors in Sequential Decision Making

Pith reviewed 2026-05-12 03:23 UTC · model grok-4.3

classification 💻 cs.LG
keywords mechanistic priorssequential decision makingBayesian regretresidual entropyhybrid modelssample complexityoccupancy measurepharmacokinetic simulation
0
0 comments X

The pith

Mechanistic priors scale Bayesian regret with residual entropy to deliver sample complexity reductions of H(μ)/H_mech in sequential decisions.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper sets out to measure exactly how much hybrid mechanistic models—physical priors paired with learned residuals—cut the data needed for effective sequential decisions. It defines mechanistic information as the mutual information between a model's suggested policy and the true optimal policy, computed through an occupancy-weighted bias that captures how often states are visited. In the large-sample limit this leads to regret that grows only with the uncertainty remaining after the prior is applied, giving a concrete reduction factor over a baseline that starts with no prior knowledge at all. For the small-sample regime the work supplies a lower bound on the extra cost paid by a confidently mistaken prior. These results matter for settings like medical dosing where collecting each new data point is costly or risky and where generic priors may lose critical structure.

Core claim

We introduce the mechanistic information of a model—the mutual information between the model's recommended policy ˆπ and the true optimal policy π*—quantified via an occupancy-weighted bias B_μ. In the asymptotic regime (large N), matched bounds reveal that Bayesian regret scales with the residual entropy H_mech, delivering a theoretical sample complexity reduction of H(μ)/H_mech compared to an uninformed baseline. We also provide a model certificate to determine empirical sample efficiency. In the burn-in regime (small N) we establish a lower bound on the penalty incurred by confidently wrong priors, and demonstrate both sets of bounds on 5-FU dosing simulations drawn from published FOLFOX

What carries the argument

Mechanistic information: the mutual information between the model's recommended policy and the true optimal policy, computed via occupancy-weighted bias B_μ that determines residual entropy H_mech and the prior's value.

If this is right

  • Bayesian regret grows linearly with residual entropy after the mechanistic prior is applied.
  • A model certificate can be computed to certify empirical sample efficiency from observable quantities.
  • Confidently incorrect priors incur a bounded but positive penalty in the small-sample regime.
  • Physically grounded priors retain higher mechanistic information than LLM priors on the same task.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The occupancy-weighted definition suggests a general recipe for injecting domain knowledge into any sequential decision problem where visitation frequencies can be estimated.
  • Safety-critical applications should prefer physically derived priors over broad generative models precisely because the latter can erase mechanistic structure.
  • The framework could be used to rank candidate priors by their expected H_mech before any online interaction begins.

Load-bearing premise

The mechanistic prior is sufficiently accurate and the occupancy measure μ accurately reflects policy overlap without introducing unaccounted bias, especially when data are scarce.

What would settle it

Measure empirical Bayesian regret in the 5-FU dosing simulation for increasing numbers of patients and check whether it tracks the predicted linear scaling with residual entropy H_mech once N is large.

Figures

Figures reproduced from arXiv: 2605.10018 by Gal Benor, Itai Shufaro, Shie Mannor.

Figure 1
Figure 1. Figure 1: Three layers of the hybrid model learning setting. [PITH_FULL_IMAGE:figures/full_fig_p002_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Sensitivity of the model-quality certificate to parameters [PITH_FULL_IMAGE:figures/full_fig_p024_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Critical bias as a function of K. The blue curve corresponds to the theoretical sample complexity ratio. The dotted line corresponds with the baseline ratio (ρ = 1). All other parameters held at their calibrated values (Bµ = 0.22, σ = 0.40, κµ = 1.8, dF = 3, N = 12). Parameter Working value Provenance Sweep range T 46 h FOLFOX6 standard [13, 20] — Target window [20, 30] mg · h/L Standard [5, 20, 35] — S 0.… view at source ↗
read the original abstract

Hybrid mechanistic models, physical priors with learned residuals, promise to reduce the data required for good decisions, but have no computable criterion to test this. We characterize the value of mechanistic priors in sequential decision-making within both asymptotic and burn-in regimes. To formalize this, we introduce the mechanistic information of a model -- the mutual information between the model's recommended policy $\hat{\pi}$ and the true optimal policy $\pi^*$ -- quantified via an occupancy-weighted bias $B_\mu$. In the asymptotic regime (large $N$), matched bounds reveal that Bayesian regret scales with the residual entropy $H_{\mathrm{mech}}$, delivering a theoretical sample complexity reduction of $H(\mu)/H_{\mathrm{mech}}$ compared to an uninformed baseline. Furthermore, we provide a model certificate to determine empirical sample efficiency. Complementarily, in the clinically relevant burn-in regime (small $N$), we establish a lower bound on the penalty incurred by confidently wrong priors. We demonstrate both the asymptotic and burn-in bounds across 5-fluorouracil (5-FU) dosing simulations motivated by published FOLFOX pharmacokinetic data, where a hybrid prior yields large sample-efficiency gains in the burn-in regime. Finally, we contrast these grounded models with LLM priors, demonstrating that LLMs can suffer severe losses in mechanistic information, thereby motivating the exclusive use of physically-grounded priors for safety-critical applications.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The manuscript claims to characterize the value of mechanistic priors in sequential decision making by introducing 'mechanistic information' as the mutual information I(π̂; π*) quantified by occupancy-weighted bias B_μ. In the asymptotic large-N regime, it asserts matched bounds showing Bayesian regret scales with residual entropy H_mech, yielding sample complexity reduction H(μ)/H_mech vs uninformed baseline. It also gives a lower bound on penalty for wrong priors in small-N burn-in regime, provides a model certificate for empirical efficiency, and demonstrates gains in 5-FU dosing simulations from FOLFOX PK data, while noting LLM priors can have low mechanistic information.

Significance. If the bounds are rigorously established without circularity, this provides a novel theoretical framework for assessing data efficiency gains from hybrid mechanistic models in RL, with direct relevance to clinical applications. The inclusion of both asymptotic and burn-in analyses, plus empirical validation on pharmacokinetic simulations, adds practical value. Explicit credit for reproducible simulation setup motivated by published data and for highlighting risks with LLM priors in safety-critical settings.

major comments (2)
  1. [Definition of mechanistic information and B_μ] The occupancy measure μ is described as the state-action occupancy of the policy recommended by the mechanistic model. This choice risks making B_μ and thus H_mech dependent on the prior itself, potentially rendering the regret scaling with H_mech and the reduction factor H(μ)/H_mech tautological rather than an independent prediction. Please provide a formal definition (e.g., Eq. for B_μ) and show that the mutual information remains unbiased or that the bounds hold for a fixed reference μ independent of the model.
  2. [Asymptotic regime analysis] The claim of matched upper and lower bounds on Bayesian regret scaling with H_mech is central but lacks visible derivation steps or exact definitions of H_mech in the abstract. In the section presenting these bounds, include the key equations and proof outline to allow verification that the scaling is not an artifact of the definition.
minor comments (2)
  1. [Abstract] The abstract mentions a 'model certificate to determine empirical sample efficiency' but provides no details; expand briefly or reference the relevant section.
  2. [Simulations] For the 5-FU dosing simulations, specify the number of independent runs, exact controls for the uninformed baseline, and any hyperparameter choices to ensure reproducibility.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive feedback, which highlights important aspects of our theoretical framework. We address each major comment below and will incorporate revisions to enhance clarity and rigor.

read point-by-point responses
  1. Referee: [Definition of mechanistic information and B_μ] The occupancy measure μ is described as the state-action occupancy of the policy recommended by the mechanistic model. This choice risks making B_μ and thus H_mech dependent on the prior itself, potentially rendering the regret scaling with H_mech and the reduction factor H(μ)/H_mech tautological rather than an independent prediction. Please provide a formal definition (e.g., Eq. for B_μ) and show that the mutual information remains unbiased or that the bounds hold for a fixed reference μ independent of the model.

    Authors: We acknowledge the potential for circularity in the current presentation and agree that clarification is needed. In the revised manuscript, we will provide the formal definition of B_μ as the occupancy-weighted bias with respect to a fixed reference occupancy measure μ, chosen independently of the mechanistic model (e.g., the occupancy induced by the true optimal policy π* or a baseline policy). This ensures that the mechanistic information I(π̂; π*) is defined with respect to an external reference, avoiding dependence on the prior. We will also demonstrate that the regret bounds hold under this fixed μ, confirming they are not tautological. revision: yes

  2. Referee: [Asymptotic regime analysis] The claim of matched upper and lower bounds on Bayesian regret scaling with H_mech is central but lacks visible derivation steps or exact definitions of H_mech in the abstract. In the section presenting these bounds, include the key equations and proof outline to allow verification that the scaling is not an artifact of the definition.

    Authors: We agree that additional detail on the derivations would improve verifiability. In the revised version, we will expand the section on the asymptotic regime to include the precise definition of the residual entropy H_mech (as the entropy of the residual uncertainty after incorporating the mechanistic prior) and outline the key steps in the proofs for both the upper and lower bounds on Bayesian regret. This will explicitly show how the scaling with H_mech arises from information-theoretic arguments and is independent of definitional artifacts. revision: yes

Circularity Check

1 steps flagged

Regret scaling with H_mech and reduction H(μ)/H_mech reduce to the definition of mechanistic information via model-induced μ

specific steps
  1. self definitional [Abstract (mechanistic information definition and asymptotic regime claim)]
    "we introduce the mechanistic information of a model -- the mutual information between the model's recommended policy π̂ and the true optimal policy π* -- quantified via an occupancy-weighted bias B_μ. In the asymptotic regime (large N), matched bounds reveal that Bayesian regret scales with the residual entropy H_mech, delivering a theoretical sample complexity reduction of H(μ)/H_mech compared to an uninformed baseline."

    H_mech is the residual entropy after subtracting the mechanistic information I(π̂; π*), which is itself quantified by B_μ using the occupancy measure μ induced by the model's recommended policy π̂. Substituting the model's own occupancy for a reference measure alters B_μ and therefore H_mech, so the scaling of regret with H_mech and the explicit reduction factor H(μ)/H_mech are obtained by construction from the prior's definition rather than derived independently.

full rationale

The paper introduces mechanistic information as I(π̂; π*) quantified by occupancy-weighted bias B_μ, then states that matched bounds show Bayesian regret scales with residual entropy H_mech (entropy after subtracting this I) and yields sample-complexity reduction H(μ)/H_mech. Because μ is the state-action occupancy of the policy recommended by the mechanistic model itself, both B_μ and the resulting H_mech are constructed from the prior's own output. The claimed asymptotic scaling and reduction factor are therefore equivalent to the amount of information the prior was defined to capture, rather than an independent first-principles prediction. The burn-in lower bound on wrong priors stands separately and does not exhibit this reduction.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 3 invented entities

The central claims rest on newly introduced quantities (mechanistic information, B_μ, H_mech) and domain assumptions about hybrid models in MDPs; no explicit free parameters are fitted in the abstract description.

axioms (1)
  • domain assumption The decision problem is a Markov decision process where the mechanistic component provides a structured prior over dynamics or rewards.
    Required for the hybrid model to be well-defined and for the occupancy measure to be meaningful.
invented entities (3)
  • mechanistic information no independent evidence
    purpose: Quantify the value of a mechanistic prior via mutual information between recommended and optimal policies
    Newly defined to formalize the benefit of hybrid models.
  • occupancy-weighted bias B_μ no independent evidence
    purpose: Measure the mechanistic information through policy overlap weighted by state occupancy
    Introduced to operationalize the mutual information definition.
  • residual entropy H_mech no independent evidence
    purpose: Capture remaining uncertainty after incorporating the mechanistic prior
    Used to scale Bayesian regret in the asymptotic analysis.

pith-pipeline@v0.9.0 · 5546 in / 1554 out tokens · 100490 ms · 2026-05-12T03:23:09.006116+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

43 extracted references · 43 canonical work pages · 1 internal anchor

  1. [1]

    Analysis of thompson sampling for the multi-armed bandit problem

    Shipra Agrawal and Navin Goyal. “Analysis of thompson sampling for the multi-armed bandit problem”. In:Conference on learning theory. JMLR Workshop and Conference Proceedings. 2012, pp. 39.1–39.26

  2. [2]

    Finite-time Analysis of the Multiarmed Bandit Problem

    Peter Auer, Nicolò Cesa-Bianchi, and Paul Fischer. “Finite-time Analysis of the Multiarmed Bandit Problem”. In:Mach. Learn.47.2–3 (May 2002), pp. 235–256.ISSN: 0885-6125.DOI: 10.1023/A:1013689704352.URL:https://doi.org/10.1023/A:1013689704352

  3. [3]

    Minimax PAC bounds on the sample complexity of reinforcement learning with a generative model

    M. G. Azar, R. Munos, and H. J. Kappen. “Minimax PAC bounds on the sample complexity of reinforcement learning with a generative model”. In:Machine Learning91.3 (2013), pp. 325– 349

  4. [4]

    Springer Science & Business Media, 2013

    J Frédéric Bonnans and Alexander Shapiro.Perturbation analysis of optimization problems. Springer Science & Business Media, 2013

  5. [5]

    Individual fluorouracil dose adjustment in FOLFOX based on phar- macokinetic follow-up compared with conventional body-area-surface dosing: a phase II, proof-of-concept study

    Olivier Capitain et al. “Individual fluorouracil dose adjustment in FOLFOX based on phar- macokinetic follow-up compared with conventional body-area-surface dosing: a phase II, proof-of-concept study”. In:Clinical colorectal cancer11.4 (2012), pp. 263–267

  6. [6]

    Neural ordinary differential equations

    Ricky TQ Chen et al. “Neural ordinary differential equations”. In:Advances in neural informa- tion processing systems31 (2018)

  7. [7]

    On kernelized multi-armed bandits

    Sayak Ray Chowdhury and Aditya Gopalan. “On kernelized multi-armed bandits”. In:Pro- ceedings of the 34th International Conference on Machine Learning - Volume 70. ICML’17. Sydney, NSW, Australia: JMLR.org, 2017, pp. 844–853

  8. [8]

    Cover and Joy A

    Thomas M. Cover and Joy A. Thomas.Elements of Information Theory. 2nd ed. Wiley- Interscience, 2006

  9. [9]

    On the sample complexity of the linear quadratic regulator

    Sarah Dean et al. “On the sample complexity of the linear quadratic regulator”. In:Foundations of Computational Mathematics20.4 (2020), pp. 633–679

  10. [10]

    The arrival of digital twins and in silico trials in drug development

    Ashley L. Eadie et al. “The arrival of digital twins and in silico trials in drug development”. In: Nature Medicine(2026)

  11. [11]

    Tree-based batch mode reinforcement learning

    Damien Ernst, Pierre Geurts, and Louis Wehenkel. “Tree-based batch mode reinforcement learning”. In:Journal of Machine Learning Research6 (2005)

  12. [12]

    Pharmacokinetically guided algorithm of 5-fluorouracil dosing: a meta-analysis

    L. Fang, W. Xin, H. Ding, et al. “Pharmacokinetically guided algorithm of 5-fluorouracil dosing: a meta-analysis”. In:Scientific Reports6 (2016), p. 25913

  13. [13]

    Circadian variation in plasma 5-fluorouracil concentrations during a 24 hour constant-rate infusion

    Gini F Fleming et al. “Circadian variation in plasma 5-fluorouracil concentrations during a 24 hour constant-rate infusion”. In:BMC cancer15.1 (2015), p. 69

  14. [14]

    Erick Gamelin et al. “Individual Fluorouracil Dose Adjustment Based on Pharmacokinetic Follow-Up Compared With Conventional Dosage: Results of a Multicenter Randomized Trial of Patients With Metastatic Colorectal Cancer”. In:Journal of Clinical Oncology26.13 (2008). PMID: 18445839, pp. 2099–2105.DOI: 10 . 1200 / JCO . 2007 . 13 . 3934. eprint: https : / /...

  15. [15]

    LISA: Learning Interpretable Skill Abstractions from Language

    Divyansh Garg et al. “LISA: Learning Interpretable Skill Abstractions from Language”. In: Advances in Neural Information Processing Systems. Ed. by Alice H. Oh et al. 2022.URL: https://openreview.net/forum?id=XZhipvOUBB

  16. [16]

    Thompson Sampling for Complex Online Problems

    Aditya Gopalan, Shie Mannor, and Yishay Mansour. “Thompson Sampling for Complex Online Problems”. In:Proceedings of the 31st International Conference on Machine Learning. Ed. by Eric P. Xing and Tony Jebara. V ol. 32. Proceedings of Machine Learning Research. Bejing, China: PMLR, 2014, pp. 100–108.URL: https://proceedings.mlr.press/v32/ gopalan14.html

  17. [17]

    An Asymptotically Optimal Bandit Algorithm for Bounded Support Models

    Junya Honda and Akimichi Takemura. “An Asymptotically Optimal Bandit Algorithm for Bounded Support Models.” In:COLT 2010 - The 23rd Conference on Learning Theory. Jan. 2010, pp. 67–79

  18. [18]

    Do As I Can, Not As I Say: Grounding Language in Robotic Affordances

    brian ichter et al. “Do As I Can, Not As I Say: Grounding Language in Robotic Affordances”. In:6th Annual Conference on Robot Learning. 2022.URL: https://openreview.net/ forum?id=bdHkMjBJG_w

  19. [19]

    Provably efficient reinforcement learning with linear function approximation

    Chi Jin et al. “Provably efficient reinforcement learning with linear function approximation”. In:Conference on learning theory. PMLR. 2020, pp. 2137–2143. 10

  20. [20]

    Modeling the 5-fluorouracil area under the curve versus dose relation- ship to develop a pharmacokinetic dosing algorithm for colorectal cancer patients receiving FOLFOX6

    Rajesh R. Kaldate et al. “Modeling the 5-fluorouracil area under the curve versus dose relation- ship to develop a pharmacokinetic dosing algorithm for colorectal cancer patients receiving FOLFOX6”. In:The Oncologist17.3 (2012), pp. 296–302

  21. [21]

    Physics-informed machine learning

    George Em Karniadakis et al. “Physics-informed machine learning”. In:Nature Reviews Physics3.6 (2021), pp. 422–440

  22. [22]

    Meta-Thompson Sampling

    Branislav Kveton et al. “Meta-Thompson Sampling”. In:Proceedings of the 38th International Conference on Machine Learning. Ed. by Marina Meila and Tong Zhang. V ol. 139. Proceedings of Machine Learning Research. PMLR, 2021, pp. 5884–5893.URL: https://proceedings. mlr.press/v139/kveton21a.html

  23. [23]

    Asymptotically efficient adaptive allocation rules

    T.L Lai and Herbert Robbins. “Asymptotically efficient adaptive allocation rules”. In:Advances in Applied Mathematics6.1 (1985), pp. 4–22.ISSN: 0196-8858.DOI: https://doi.org/10. 1016/0196- 8858(85)90002- 8 .URL: https://www.sciencedirect.com/science/ article/pii/0196885885900028

  24. [24]

    Cambridge University Press, 2020

    Tor Lattimore and Csaba Szepesvári.Bandit algorithms. Cambridge University Press, 2020

  25. [25]

    Bayesian multi-task reinforcement learning

    Alessandro Lazaric and Mohammad Ghavamzadeh. “Bayesian multi-task reinforcement learning”. In:Proceedings of the 27th International Conference on International Confer- ence on Machine Learning. ICML’10. Haifa, Israel: Omnipress, 2010, pp. 599–606.ISBN: 9781605589077

  26. [26]

    Circadian timing in cancer treatments

    Francis A. Lévi et al. “Circadian timing in cancer treatments”. In:Annual Review of Pharma- cology and Toxicology50 (2010), pp. 377–421

  27. [27]

    Drug monitoring detects under- and overdosing in patients receiving 5-fluorouracil- containing chemotherapy: results of a prospective, multicenter German observational study

    M. Li et al. “Drug monitoring detects under- and overdosing in patients receiving 5-fluorouracil- containing chemotherapy: results of a prospective, multicenter German observational study”. In:ESMO Open8.2 (2023), p. 101201

  28. [28]

    On the Prior Sensitivity of Thompson Sampling

    Che-Yu Liu and Lihong Li. “On the Prior Sensitivity of Thompson Sampling”. In:Proceedings of the 27th International Conference on Algorithmic Learning Theory (ALT). Springer, 2016, pp. 321–336.DOI:10.1007/978-3-319-46379-7\_22

  29. [29]

    5-FU therapeutic drug monitoring as a valuable option to reduce toxicity in patients with gastrointestinal cancer

    Katarzyna Morawska et al. “5-FU therapeutic drug monitoring as a valuable option to reduce toxicity in patients with gastrointestinal cancer”. In:Oncotarget9.14 (2018), p. 11559

  30. [30]

    Universal Differential Equations for Scientific Machine Learning

    Christopher Rackauckas et al. “Universal differential equations for scientific machine learning”. In:arXiv preprint arXiv:2001.04385(2020)

  31. [31]

    Physics-informed neural networks: A deep learning framework for solving forward and inverse problems involving nonlinear partial differential equations

    M. Raissi, P. Perdikaris, and G.E. Karniadakis. “Physics-informed neural networks: A deep learning framework for solving forward and inverse problems involving nonlinear partial differential equations”. In:Journal of Computational Physics378 (2019), pp. 686–707.ISSN: 0021-9991.DOI: https : / / doi . org / 10 . 1016 / j . jcp . 2018 . 10 . 045.URL: https :...

  32. [32]

    Learning to Optimize via Posterior Sampling

    Daniel Russo and Benjamin Van Roy. “Learning to Optimize via Posterior Sampling”. In: Mathematics of Operations Research39.4 (2014), pp. 1221–1243.ISSN: 0364765X, 15265471. URL:http://www.jstor.org/stable/24541007(visited on 04/11/2026)

  33. [33]

    An information-theoretic analysis of thompson sam- pling

    Daniel Russo and Benjamin Van Roy. “An information-theoretic analysis of thompson sam- pling”. In:Journal of Machine Learning Research17.68 (2016), pp. 1–30

  34. [34]

    A tutorial on thompson sampling

    Daniel J. Russo et al. “A tutorial on thompson sampling”. In:Foundations and Trends® in Machine Learning11.1 (2018), pp. 1–99

  35. [35]

    Pharmacokinetically guided dose adjustment of 5-fluorouracil: a rational approach to improving therapeutic outcomes

    M. Wasif Saif et al. “Pharmacokinetically guided dose adjustment of 5-fluorouracil: a rational approach to improving therapeutic outcomes”. In:JNCI: Journal of the National Cancer Institute101.22 (2009), pp. 1543–1552

  36. [36]

    Informing sequential clinical decision-making through reinforce- ment learning: an empirical study

    Susan M. Shortreed et al. “Informing sequential clinical decision-making through reinforce- ment learning: an empirical study”. In:Mach. Learn.84.1–2 (July 2011), pp. 109–136.ISSN: 0885-6125.DOI: 10.1007/s10994- 010- 5229- 0 .URL: https://doi.org/10.1007/ s10994-010-5229-0

  37. [37]

    On Bits and Bandits: Quantifying the Regret-Information Trade-off

    Itai Shufaro et al. “On Bits and Bandits: Quantifying the Regret-Information Trade-off”. In: The Thirteenth International Conference on Learning Representations. 2025.URL: https: //openreview.net/forum?id=0oWGVvC6oq

  38. [38]

    Gaussian process optimization in the bandit setting: no regret and experimental design

    Niranjan Srinivas et al. “Gaussian process optimization in the bandit setting: no regret and experimental design”. In:Proceedings of the 27th International Conference on International Conference on Machine Learning. ICML’10. Haifa, Israel: Omnipress, 2010, pp. 1015–1022. ISBN: 9781605589077. 11

  39. [39]

    On The Likelihood That One Unknown Probability Exceeds Another in View of The Evidence of Two Samples

    William R Thompson. “On The Likelihood That One Unknown Probability Exceeds Another in View of The Evidence of Two Samples”. In:Biometrika25.3-4 (Dec. 1933), pp. 285–294. ISSN: 0006-3444.DOI: 10 . 1093 / biomet / 25 . 3 - 4 . 285. eprint: https : / / academic . oup . com / biomet / article - pdf / 25 / 3 - 4 / 285 / 513725 / 25 - 3 - 4 - 285 . pdf.URL: ht...

  40. [40]

    Sex and adverse events of adjuvant chemotherapy in colon cancer: an analysis of 34 640 patients in the ACCENT database

    Anna D Wagner et al. “Sex and adverse events of adjuvant chemotherapy in colon cancer: an analysis of 34 640 patients in the ACCENT database”. In:JNCI: Journal of the National Cancer Institute113.4 (2021), pp. 400–407

  41. [41]

    Optimum Character of the Sequential Probability Ratio Test

    Abraham Wald and Jacob Wolfowitz. “Optimum Character of the Sequential Probability Ratio Test”. In:Annals of Mathematical Statistics19 (1948), pp. 326–339.URL: https : //api.semanticscholar.org/CorpusID:122130353

  42. [42]

    Prospective, multicenter study of 5-fluorouracil therapeutic drug monitoring in metastatic colorectal cancer treated in routine clinical practice

    Martin Wilhelm et al. “Prospective, multicenter study of 5-fluorouracil therapeutic drug monitoring in metastatic colorectal cancer treated in routine clinical practice”. In:Clinical Colorectal Cancer15.4 (2016), pp. 381–388.DOI:10.1016/j.clcc.2016.04.001

  43. [43]

    Augmenting Physical Models with Deep Networks for Complex Dynamics Forecasting

    Yuan Yin et al. “Augmenting Physical Models with Deep Networks for Complex Dynamics Forecasting”. In:International Conference on Learning Representations. 2021.URL: https: //openreview.net/forum?id=kmG8vRXTFv. 12 A Notation and conventions The following table summarises the symbols used throughout the paper, in order of first appearance. Full formal defin...