arxiv: 2605.10018 · v1 · submitted 2026-05-11 · 💻 cs.LG

The Value of Mechanistic Priors in Sequential Decision Making

Itai Shufaro , Gal Benor , Shie Mannor This is my paper

Pith reviewed 2026-05-12 03:23 UTC · model grok-4.3

classification 💻 cs.LG

keywords mechanistic priorssequential decision makingBayesian regretresidual entropyhybrid modelssample complexityoccupancy measurepharmacokinetic simulation

0 comments

The pith

Mechanistic priors scale Bayesian regret with residual entropy to deliver sample complexity reductions of H(μ)/H_mech in sequential decisions.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper sets out to measure exactly how much hybrid mechanistic models—physical priors paired with learned residuals—cut the data needed for effective sequential decisions. It defines mechanistic information as the mutual information between a model's suggested policy and the true optimal policy, computed through an occupancy-weighted bias that captures how often states are visited. In the large-sample limit this leads to regret that grows only with the uncertainty remaining after the prior is applied, giving a concrete reduction factor over a baseline that starts with no prior knowledge at all. For the small-sample regime the work supplies a lower bound on the extra cost paid by a confidently mistaken prior. These results matter for settings like medical dosing where collecting each new data point is costly or risky and where generic priors may lose critical structure.

Core claim

We introduce the mechanistic information of a model—the mutual information between the model's recommended policy ˆπ and the true optimal policy π*—quantified via an occupancy-weighted bias B_μ. In the asymptotic regime (large N), matched bounds reveal that Bayesian regret scales with the residual entropy H_mech, delivering a theoretical sample complexity reduction of H(μ)/H_mech compared to an uninformed baseline. We also provide a model certificate to determine empirical sample efficiency. In the burn-in regime (small N) we establish a lower bound on the penalty incurred by confidently wrong priors, and demonstrate both sets of bounds on 5-FU dosing simulations drawn from published FOLFOX

What carries the argument

Mechanistic information: the mutual information between the model's recommended policy and the true optimal policy, computed via occupancy-weighted bias B_μ that determines residual entropy H_mech and the prior's value.

If this is right

Bayesian regret grows linearly with residual entropy after the mechanistic prior is applied.
A model certificate can be computed to certify empirical sample efficiency from observable quantities.
Confidently incorrect priors incur a bounded but positive penalty in the small-sample regime.
Physically grounded priors retain higher mechanistic information than LLM priors on the same task.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The occupancy-weighted definition suggests a general recipe for injecting domain knowledge into any sequential decision problem where visitation frequencies can be estimated.
Safety-critical applications should prefer physically derived priors over broad generative models precisely because the latter can erase mechanistic structure.
The framework could be used to rank candidate priors by their expected H_mech before any online interaction begins.

Load-bearing premise

The mechanistic prior is sufficiently accurate and the occupancy measure μ accurately reflects policy overlap without introducing unaccounted bias, especially when data are scarce.

What would settle it

Measure empirical Bayesian regret in the 5-FU dosing simulation for increasing numbers of patients and check whether it tracks the predicted linear scaling with residual entropy H_mech once N is large.

Figures

Figures reproduced from arXiv: 2605.10018 by Gal Benor, Itai Shufaro, Shie Mannor.

**Figure 2.** Figure 2: Sensitivity of the model-quality certificate to parameters [PITH_FULL_IMAGE:figures/full_fig_p024_2.png] view at source ↗

**Figure 3.** Figure 3: Critical bias as a function of K. The blue curve corresponds to the theoretical sample complexity ratio. The dotted line corresponds with the baseline ratio (ρ = 1). All other parameters held at their calibrated values (Bµ = 0.22, σ = 0.40, κµ = 1.8, dF = 3, N = 12). Parameter Working value Provenance Sweep range T 46 h FOLFOX6 standard [13, 20] — Target window [20, 30] mg · h/L Standard [5, 20, 35] — S 0.… view at source ↗

read the original abstract

Hybrid mechanistic models, physical priors with learned residuals, promise to reduce the data required for good decisions, but have no computable criterion to test this. We characterize the value of mechanistic priors in sequential decision-making within both asymptotic and burn-in regimes. To formalize this, we introduce the mechanistic information of a model -- the mutual information between the model's recommended policy $\hat{\pi}$ and the true optimal policy $\pi^*$ -- quantified via an occupancy-weighted bias $B_\mu$. In the asymptotic regime (large $N$), matched bounds reveal that Bayesian regret scales with the residual entropy $H_{\mathrm{mech}}$, delivering a theoretical sample complexity reduction of $H(\mu)/H_{\mathrm{mech}}$ compared to an uninformed baseline. Furthermore, we provide a model certificate to determine empirical sample efficiency. Complementarily, in the clinically relevant burn-in regime (small $N$), we establish a lower bound on the penalty incurred by confidently wrong priors. We demonstrate both the asymptotic and burn-in bounds across 5-fluorouracil (5-FU) dosing simulations motivated by published FOLFOX pharmacokinetic data, where a hybrid prior yields large sample-efficiency gains in the burn-in regime. Finally, we contrast these grounded models with LLM priors, demonstrating that LLMs can suffer severe losses in mechanistic information, thereby motivating the exclusive use of physically-grounded priors for safety-critical applications.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper formalizes how mechanistic priors reduce Bayesian regret via residual entropy and occupancy-weighted mutual information, with a solid dosing simulation but a potential circularity in how the occupancy measure is defined.

read the letter

The main point is that they give a way to quantify the value of a hybrid mechanistic prior in sequential decisions by treating it as mutual information between the prior's policy and the optimal one, measured through an occupancy-weighted bias. In the large-sample limit this leads to regret scaling with residual entropy H_mech and a sample-complexity improvement of H(μ)/H_mech over an uninformed baseline; they also add a lower bound on the penalty for confidently wrong priors in the small-sample burn-in regime and test it on 5-FU dosing with published pharmacokinetic parameters. The LLM comparison is a useful side note showing how ungrounded priors can lose mechanistic information. What the work does cleanly is tie the information-theoretic quantity directly to regret bounds and then show concrete gains in a realistic control setting rather than staying purely abstract. The simulation setup looks reasonable given the real data motivation. The soft spot is the occupancy measure μ itself. If μ is taken from the policy recommended by the mechanistic model, then the bias term B_μ and the extracted H_mech become dependent on the prior being evaluated; that makes the claimed reduction factor less invariant than it first appears and risks turning the argument partly tautological. The paper should clarify whether μ is a fixed reference occupancy or model-induced, because the stress-test concern lands on that point. Without seeing the full derivations it is also hard to judge how tight the matched bounds really are or whether extra assumptions on the policy class are doing hidden work. This is aimed at people working on hybrid models for data-efficient RL or safe control, particularly in healthcare or physical systems. A reader who wants theoretical grounding for when priors help will find the bounds and the dosing example useful. It is coherent enough and has enough new formalization plus reproducible simulation elements to deserve a serious referee, even if the occupancy definition needs tightening in revision.

Referee Report

2 major / 2 minor

Summary. The manuscript claims to characterize the value of mechanistic priors in sequential decision making by introducing 'mechanistic information' as the mutual information I(π̂; π*) quantified by occupancy-weighted bias B_μ. In the asymptotic large-N regime, it asserts matched bounds showing Bayesian regret scales with residual entropy H_mech, yielding sample complexity reduction H(μ)/H_mech vs uninformed baseline. It also gives a lower bound on penalty for wrong priors in small-N burn-in regime, provides a model certificate for empirical efficiency, and demonstrates gains in 5-FU dosing simulations from FOLFOX PK data, while noting LLM priors can have low mechanistic information.

Significance. If the bounds are rigorously established without circularity, this provides a novel theoretical framework for assessing data efficiency gains from hybrid mechanistic models in RL, with direct relevance to clinical applications. The inclusion of both asymptotic and burn-in analyses, plus empirical validation on pharmacokinetic simulations, adds practical value. Explicit credit for reproducible simulation setup motivated by published data and for highlighting risks with LLM priors in safety-critical settings.

major comments (2)

[Definition of mechanistic information and B_μ] The occupancy measure μ is described as the state-action occupancy of the policy recommended by the mechanistic model. This choice risks making B_μ and thus H_mech dependent on the prior itself, potentially rendering the regret scaling with H_mech and the reduction factor H(μ)/H_mech tautological rather than an independent prediction. Please provide a formal definition (e.g., Eq. for B_μ) and show that the mutual information remains unbiased or that the bounds hold for a fixed reference μ independent of the model.
[Asymptotic regime analysis] The claim of matched upper and lower bounds on Bayesian regret scaling with H_mech is central but lacks visible derivation steps or exact definitions of H_mech in the abstract. In the section presenting these bounds, include the key equations and proof outline to allow verification that the scaling is not an artifact of the definition.

minor comments (2)

[Abstract] The abstract mentions a 'model certificate to determine empirical sample efficiency' but provides no details; expand briefly or reference the relevant section.
[Simulations] For the 5-FU dosing simulations, specify the number of independent runs, exact controls for the uninformed baseline, and any hyperparameter choices to ensure reproducibility.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive feedback, which highlights important aspects of our theoretical framework. We address each major comment below and will incorporate revisions to enhance clarity and rigor.

read point-by-point responses

Referee: [Definition of mechanistic information and B_μ] The occupancy measure μ is described as the state-action occupancy of the policy recommended by the mechanistic model. This choice risks making B_μ and thus H_mech dependent on the prior itself, potentially rendering the regret scaling with H_mech and the reduction factor H(μ)/H_mech tautological rather than an independent prediction. Please provide a formal definition (e.g., Eq. for B_μ) and show that the mutual information remains unbiased or that the bounds hold for a fixed reference μ independent of the model.

Authors: We acknowledge the potential for circularity in the current presentation and agree that clarification is needed. In the revised manuscript, we will provide the formal definition of B_μ as the occupancy-weighted bias with respect to a fixed reference occupancy measure μ, chosen independently of the mechanistic model (e.g., the occupancy induced by the true optimal policy π* or a baseline policy). This ensures that the mechanistic information I(π̂; π*) is defined with respect to an external reference, avoiding dependence on the prior. We will also demonstrate that the regret bounds hold under this fixed μ, confirming they are not tautological. revision: yes
Referee: [Asymptotic regime analysis] The claim of matched upper and lower bounds on Bayesian regret scaling with H_mech is central but lacks visible derivation steps or exact definitions of H_mech in the abstract. In the section presenting these bounds, include the key equations and proof outline to allow verification that the scaling is not an artifact of the definition.

Authors: We agree that additional detail on the derivations would improve verifiability. In the revised version, we will expand the section on the asymptotic regime to include the precise definition of the residual entropy H_mech (as the entropy of the residual uncertainty after incorporating the mechanistic prior) and outline the key steps in the proofs for both the upper and lower bounds on Bayesian regret. This will explicitly show how the scaling with H_mech arises from information-theoretic arguments and is independent of definitional artifacts. revision: yes

Circularity Check

1 steps flagged

Regret scaling with H_mech and reduction H(μ)/H_mech reduce to the definition of mechanistic information via model-induced μ

specific steps

self definitional [Abstract (mechanistic information definition and asymptotic regime claim)]
"we introduce the mechanistic information of a model -- the mutual information between the model's recommended policy π̂ and the true optimal policy π* -- quantified via an occupancy-weighted bias B_μ. In the asymptotic regime (large N), matched bounds reveal that Bayesian regret scales with the residual entropy H_mech, delivering a theoretical sample complexity reduction of H(μ)/H_mech compared to an uninformed baseline."

H_mech is the residual entropy after subtracting the mechanistic information I(π̂; π*), which is itself quantified by B_μ using the occupancy measure μ induced by the model's recommended policy π̂. Substituting the model's own occupancy for a reference measure alters B_μ and therefore H_mech, so the scaling of regret with H_mech and the explicit reduction factor H(μ)/H_mech are obtained by construction from the prior's definition rather than derived independently.

full rationale

The paper introduces mechanistic information as I(π̂; π*) quantified by occupancy-weighted bias B_μ, then states that matched bounds show Bayesian regret scales with residual entropy H_mech (entropy after subtracting this I) and yields sample-complexity reduction H(μ)/H_mech. Because μ is the state-action occupancy of the policy recommended by the mechanistic model itself, both B_μ and the resulting H_mech are constructed from the prior's own output. The claimed asymptotic scaling and reduction factor are therefore equivalent to the amount of information the prior was defined to capture, rather than an independent first-principles prediction. The burn-in lower bound on wrong priors stands separately and does not exhibit this reduction.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 3 invented entities

The central claims rest on newly introduced quantities (mechanistic information, B_μ, H_mech) and domain assumptions about hybrid models in MDPs; no explicit free parameters are fitted in the abstract description.

axioms (1)

domain assumption The decision problem is a Markov decision process where the mechanistic component provides a structured prior over dynamics or rewards.
Required for the hybrid model to be well-defined and for the occupancy measure to be meaningful.

invented entities (3)

mechanistic information no independent evidence
purpose: Quantify the value of a mechanistic prior via mutual information between recommended and optimal policies
Newly defined to formalize the benefit of hybrid models.
occupancy-weighted bias B_μ no independent evidence
purpose: Measure the mechanistic information through policy overlap weighted by state occupancy
Introduced to operationalize the mutual information definition.
residual entropy H_mech no independent evidence
purpose: Capture remaining uncertainty after incorporating the mechanistic prior
Used to scale Bayesian regret in the asymptotic analysis.

pith-pipeline@v0.9.0 · 5546 in / 1554 out tokens · 100490 ms · 2026-05-12T03:23:09.006116+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

mechanistic information Rmech = I_μ(π*; π̂) quantified via occupancy-weighted bias B_μ ... residual entropy Hmech = H(μ) − Rmech ... Bayesian regret scales with √(K N Hmech / log K)
IndisputableMonolith/Foundation/BranchSelection.lean branch_selection unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

Bμ = √(Σ μ(πk) (J(πk; M*) − J(πk; M̂))² ) ... channel capacity C(Bμ) = (dF/2) log(1 + κ²μ σ²F / (κ²μ B²μ + σ²))

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

43 extracted references · 43 canonical work pages · 1 internal anchor

[1]

Analysis of thompson sampling for the multi-armed bandit problem

Shipra Agrawal and Navin Goyal. “Analysis of thompson sampling for the multi-armed bandit problem”. In:Conference on learning theory. JMLR Workshop and Conference Proceedings. 2012, pp. 39.1–39.26

work page 2012
[2]

Finite-time Analysis of the Multiarmed Bandit Problem

Peter Auer, Nicolò Cesa-Bianchi, and Paul Fischer. “Finite-time Analysis of the Multiarmed Bandit Problem”. In:Mach. Learn.47.2–3 (May 2002), pp. 235–256.ISSN: 0885-6125.DOI: 10.1023/A:1013689704352.URL:https://doi.org/10.1023/A:1013689704352

work page doi:10.1023/a:1013689704352.url:https://doi.org/10.1023/a:1013689704352 2002
[3]

Minimax PAC bounds on the sample complexity of reinforcement learning with a generative model

M. G. Azar, R. Munos, and H. J. Kappen. “Minimax PAC bounds on the sample complexity of reinforcement learning with a generative model”. In:Machine Learning91.3 (2013), pp. 325– 349

work page 2013
[4]

Springer Science & Business Media, 2013

J Frédéric Bonnans and Alexander Shapiro.Perturbation analysis of optimization problems. Springer Science & Business Media, 2013

work page 2013
[5]

Individual fluorouracil dose adjustment in FOLFOX based on phar- macokinetic follow-up compared with conventional body-area-surface dosing: a phase II, proof-of-concept study

Olivier Capitain et al. “Individual fluorouracil dose adjustment in FOLFOX based on phar- macokinetic follow-up compared with conventional body-area-surface dosing: a phase II, proof-of-concept study”. In:Clinical colorectal cancer11.4 (2012), pp. 263–267

work page 2012
[6]

Neural ordinary differential equations

Ricky TQ Chen et al. “Neural ordinary differential equations”. In:Advances in neural informa- tion processing systems31 (2018)

work page 2018
[7]

On kernelized multi-armed bandits

Sayak Ray Chowdhury and Aditya Gopalan. “On kernelized multi-armed bandits”. In:Pro- ceedings of the 34th International Conference on Machine Learning - Volume 70. ICML’17. Sydney, NSW, Australia: JMLR.org, 2017, pp. 844–853

work page 2017
[8]

Cover and Joy A

Thomas M. Cover and Joy A. Thomas.Elements of Information Theory. 2nd ed. Wiley- Interscience, 2006

work page 2006
[9]

On the sample complexity of the linear quadratic regulator

Sarah Dean et al. “On the sample complexity of the linear quadratic regulator”. In:Foundations of Computational Mathematics20.4 (2020), pp. 633–679

work page 2020
[10]

The arrival of digital twins and in silico trials in drug development

Ashley L. Eadie et al. “The arrival of digital twins and in silico trials in drug development”. In: Nature Medicine(2026)

work page 2026
[11]

Tree-based batch mode reinforcement learning

Damien Ernst, Pierre Geurts, and Louis Wehenkel. “Tree-based batch mode reinforcement learning”. In:Journal of Machine Learning Research6 (2005)

work page 2005
[12]

Pharmacokinetically guided algorithm of 5-fluorouracil dosing: a meta-analysis

L. Fang, W. Xin, H. Ding, et al. “Pharmacokinetically guided algorithm of 5-fluorouracil dosing: a meta-analysis”. In:Scientific Reports6 (2016), p. 25913

work page 2016
[13]

Circadian variation in plasma 5-fluorouracil concentrations during a 24 hour constant-rate infusion

Gini F Fleming et al. “Circadian variation in plasma 5-fluorouracil concentrations during a 24 hour constant-rate infusion”. In:BMC cancer15.1 (2015), p. 69

work page 2015
[14]

Erick Gamelin et al. “Individual Fluorouracil Dose Adjustment Based on Pharmacokinetic Follow-Up Compared With Conventional Dosage: Results of a Multicenter Randomized Trial of Patients With Metastatic Colorectal Cancer”. In:Journal of Clinical Oncology26.13 (2008). PMID: 18445839, pp. 2099–2105.DOI: 10 . 1200 / JCO . 2007 . 13 . 3934. eprint: https : / /...

work page doi:10.1200/jco.2007.13.3934 2008
[15]

LISA: Learning Interpretable Skill Abstractions from Language

Divyansh Garg et al. “LISA: Learning Interpretable Skill Abstractions from Language”. In: Advances in Neural Information Processing Systems. Ed. by Alice H. Oh et al. 2022.URL: https://openreview.net/forum?id=XZhipvOUBB

work page 2022
[16]

Thompson Sampling for Complex Online Problems

Aditya Gopalan, Shie Mannor, and Yishay Mansour. “Thompson Sampling for Complex Online Problems”. In:Proceedings of the 31st International Conference on Machine Learning. Ed. by Eric P. Xing and Tony Jebara. V ol. 32. Proceedings of Machine Learning Research. Bejing, China: PMLR, 2014, pp. 100–108.URL: https://proceedings.mlr.press/v32/ gopalan14.html

work page 2014
[17]

An Asymptotically Optimal Bandit Algorithm for Bounded Support Models

Junya Honda and Akimichi Takemura. “An Asymptotically Optimal Bandit Algorithm for Bounded Support Models.” In:COLT 2010 - The 23rd Conference on Learning Theory. Jan. 2010, pp. 67–79

work page 2010
[18]

Do As I Can, Not As I Say: Grounding Language in Robotic Affordances

brian ichter et al. “Do As I Can, Not As I Say: Grounding Language in Robotic Affordances”. In:6th Annual Conference on Robot Learning. 2022.URL: https://openreview.net/ forum?id=bdHkMjBJG_w

work page 2022
[19]

Provably efficient reinforcement learning with linear function approximation

Chi Jin et al. “Provably efficient reinforcement learning with linear function approximation”. In:Conference on learning theory. PMLR. 2020, pp. 2137–2143. 10

work page 2020
[20]

Modeling the 5-fluorouracil area under the curve versus dose relation- ship to develop a pharmacokinetic dosing algorithm for colorectal cancer patients receiving FOLFOX6

Rajesh R. Kaldate et al. “Modeling the 5-fluorouracil area under the curve versus dose relation- ship to develop a pharmacokinetic dosing algorithm for colorectal cancer patients receiving FOLFOX6”. In:The Oncologist17.3 (2012), pp. 296–302

work page 2012
[21]

Physics-informed machine learning

George Em Karniadakis et al. “Physics-informed machine learning”. In:Nature Reviews Physics3.6 (2021), pp. 422–440

work page 2021
[22]

Meta-Thompson Sampling

Branislav Kveton et al. “Meta-Thompson Sampling”. In:Proceedings of the 38th International Conference on Machine Learning. Ed. by Marina Meila and Tong Zhang. V ol. 139. Proceedings of Machine Learning Research. PMLR, 2021, pp. 5884–5893.URL: https://proceedings. mlr.press/v139/kveton21a.html

work page 2021
[23]

Asymptotically efficient adaptive allocation rules

T.L Lai and Herbert Robbins. “Asymptotically efficient adaptive allocation rules”. In:Advances in Applied Mathematics6.1 (1985), pp. 4–22.ISSN: 0196-8858.DOI: https://doi.org/10. 1016/0196- 8858(85)90002- 8 .URL: https://www.sciencedirect.com/science/ article/pii/0196885885900028

work page arXiv 1985
[24]

Cambridge University Press, 2020

Tor Lattimore and Csaba Szepesvári.Bandit algorithms. Cambridge University Press, 2020

work page 2020
[25]

Bayesian multi-task reinforcement learning

Alessandro Lazaric and Mohammad Ghavamzadeh. “Bayesian multi-task reinforcement learning”. In:Proceedings of the 27th International Conference on International Confer- ence on Machine Learning. ICML’10. Haifa, Israel: Omnipress, 2010, pp. 599–606.ISBN: 9781605589077

work page 2010
[26]

Circadian timing in cancer treatments

Francis A. Lévi et al. “Circadian timing in cancer treatments”. In:Annual Review of Pharma- cology and Toxicology50 (2010), pp. 377–421

work page 2010
[27]

Drug monitoring detects under- and overdosing in patients receiving 5-fluorouracil- containing chemotherapy: results of a prospective, multicenter German observational study

M. Li et al. “Drug monitoring detects under- and overdosing in patients receiving 5-fluorouracil- containing chemotherapy: results of a prospective, multicenter German observational study”. In:ESMO Open8.2 (2023), p. 101201

work page 2023
[28]

On the Prior Sensitivity of Thompson Sampling

Che-Yu Liu and Lihong Li. “On the Prior Sensitivity of Thompson Sampling”. In:Proceedings of the 27th International Conference on Algorithmic Learning Theory (ALT). Springer, 2016, pp. 321–336.DOI:10.1007/978-3-319-46379-7\_22

work page doi:10.1007/978-3-319-46379-7 2016
[29]

5-FU therapeutic drug monitoring as a valuable option to reduce toxicity in patients with gastrointestinal cancer

Katarzyna Morawska et al. “5-FU therapeutic drug monitoring as a valuable option to reduce toxicity in patients with gastrointestinal cancer”. In:Oncotarget9.14 (2018), p. 11559

work page 2018
[30]

Universal Differential Equations for Scientific Machine Learning

Christopher Rackauckas et al. “Universal differential equations for scientific machine learning”. In:arXiv preprint arXiv:2001.04385(2020)

work page internal anchor Pith review arXiv 2001
[31]

Physics-informed neural networks: A deep learning framework for solving forward and inverse problems involving nonlinear partial differential equations

M. Raissi, P. Perdikaris, and G.E. Karniadakis. “Physics-informed neural networks: A deep learning framework for solving forward and inverse problems involving nonlinear partial differential equations”. In:Journal of Computational Physics378 (2019), pp. 686–707.ISSN: 0021-9991.DOI: https : / / doi . org / 10 . 1016 / j . jcp . 2018 . 10 . 045.URL: https :...

work page 2019
[32]

Learning to Optimize via Posterior Sampling

Daniel Russo and Benjamin Van Roy. “Learning to Optimize via Posterior Sampling”. In: Mathematics of Operations Research39.4 (2014), pp. 1221–1243.ISSN: 0364765X, 15265471. URL:http://www.jstor.org/stable/24541007(visited on 04/11/2026)

work page arXiv 2014
[33]

An information-theoretic analysis of thompson sam- pling

Daniel Russo and Benjamin Van Roy. “An information-theoretic analysis of thompson sam- pling”. In:Journal of Machine Learning Research17.68 (2016), pp. 1–30

work page 2016
[34]

A tutorial on thompson sampling

Daniel J. Russo et al. “A tutorial on thompson sampling”. In:Foundations and Trends® in Machine Learning11.1 (2018), pp. 1–99

work page 2018
[35]

Pharmacokinetically guided dose adjustment of 5-fluorouracil: a rational approach to improving therapeutic outcomes

M. Wasif Saif et al. “Pharmacokinetically guided dose adjustment of 5-fluorouracil: a rational approach to improving therapeutic outcomes”. In:JNCI: Journal of the National Cancer Institute101.22 (2009), pp. 1543–1552

work page 2009
[36]

Informing sequential clinical decision-making through reinforce- ment learning: an empirical study

Susan M. Shortreed et al. “Informing sequential clinical decision-making through reinforce- ment learning: an empirical study”. In:Mach. Learn.84.1–2 (July 2011), pp. 109–136.ISSN: 0885-6125.DOI: 10.1007/s10994- 010- 5229- 0 .URL: https://doi.org/10.1007/ s10994-010-5229-0

work page doi:10.1007/s10994- 2011
[37]

On Bits and Bandits: Quantifying the Regret-Information Trade-off

Itai Shufaro et al. “On Bits and Bandits: Quantifying the Regret-Information Trade-off”. In: The Thirteenth International Conference on Learning Representations. 2025.URL: https: //openreview.net/forum?id=0oWGVvC6oq

work page 2025
[38]

Gaussian process optimization in the bandit setting: no regret and experimental design

Niranjan Srinivas et al. “Gaussian process optimization in the bandit setting: no regret and experimental design”. In:Proceedings of the 27th International Conference on International Conference on Machine Learning. ICML’10. Haifa, Israel: Omnipress, 2010, pp. 1015–1022. ISBN: 9781605589077. 11

work page 2010
[39]

On The Likelihood That One Unknown Probability Exceeds Another in View of The Evidence of Two Samples

William R Thompson. “On The Likelihood That One Unknown Probability Exceeds Another in View of The Evidence of Two Samples”. In:Biometrika25.3-4 (Dec. 1933), pp. 285–294. ISSN: 0006-3444.DOI: 10 . 1093 / biomet / 25 . 3 - 4 . 285. eprint: https : / / academic . oup . com / biomet / article - pdf / 25 / 3 - 4 / 285 / 513725 / 25 - 3 - 4 - 285 . pdf.URL: ht...

work page doi:10.1093/biomet/25.3-4.285 1933
[40]

Sex and adverse events of adjuvant chemotherapy in colon cancer: an analysis of 34 640 patients in the ACCENT database

Anna D Wagner et al. “Sex and adverse events of adjuvant chemotherapy in colon cancer: an analysis of 34 640 patients in the ACCENT database”. In:JNCI: Journal of the National Cancer Institute113.4 (2021), pp. 400–407

work page 2021
[41]

Optimum Character of the Sequential Probability Ratio Test

Abraham Wald and Jacob Wolfowitz. “Optimum Character of the Sequential Probability Ratio Test”. In:Annals of Mathematical Statistics19 (1948), pp. 326–339.URL: https : //api.semanticscholar.org/CorpusID:122130353

work page 1948
[42]

Prospective, multicenter study of 5-fluorouracil therapeutic drug monitoring in metastatic colorectal cancer treated in routine clinical practice

Martin Wilhelm et al. “Prospective, multicenter study of 5-fluorouracil therapeutic drug monitoring in metastatic colorectal cancer treated in routine clinical practice”. In:Clinical Colorectal Cancer15.4 (2016), pp. 381–388.DOI:10.1016/j.clcc.2016.04.001

work page doi:10.1016/j.clcc.2016.04.001 2016
[43]

Augmenting Physical Models with Deep Networks for Complex Dynamics Forecasting

Yuan Yin et al. “Augmenting Physical Models with Deep Networks for Complex Dynamics Forecasting”. In:International Conference on Learning Representations. 2021.URL: https: //openreview.net/forum?id=kmG8vRXTFv. 12 A Notation and conventions The following table summarises the symbols used throughout the paper, in order of first appearance. Full formal defin...

work page 2021