arxiv: 2605.07485 · v1 · submitted 2026-05-08 · 💻 cs.LG · cs.AI

Recognition: 2 theorem links

· Lean Theorem

Excluding the Target Domain Improves Extrapolation: Deconfounded Hierarchical Physics Constraints

Tsuyoshi Okita

Authors on Pith no claims yet

Pith reviewed 2026-05-11 01:47 UTC · model grok-4.3

classification 💻 cs.LG cs.AI

keywords extrapolationphysics-constrained modelsdeconfoundinghierarchical constraintsout-of-distribution generalizationbattery temperatureFourier neural operators

0 comments

The pith

Excluding target-domain data from pretraining improves extrapolation by 39 percent in physics-constrained models.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper seeks to solve poor extrapolation in physics-constrained generative models when test conditions differ from training data. It introduces a gate that detects and removes temperature confounding at successive levels of physical rules before enforcing those rules from coarse to fine. A central result is that withholding the target temperature range during pretraining lets the model learn more general physical patterns, cutting error compared with including that data. This matters for tasks such as forecasting battery temperatures across wide environmental ranges where new conditions appear at deployment. If the approach holds, models could maintain accuracy when physical laws interact with shifting external variables without retraining on every new scenario.

Core claim

The Deconfounded Hierarchical Gate identifies when temperature confounding affects each physical constraint level through counterfactual estimation with the do-operator and backdoor adjustment, then enforces constraints progressively from coarse to fine. Pretraining without target-domain data yields RMSE of 0.224 versus 0.324 when target data is included, a 39 percent gain in extrapolation; on the lithium-ion battery benchmark trained at 24 degrees Celsius and tested at 4 to 43 degrees Celsius the method reaches RMSE 0.215, a 46 percent improvement over the unconstrained baseline of 0.397.

What carries the argument

The Deconfounded Hierarchical Gate (DHG), a mechanism that combines do-operator counterfactual estimation and backdoor adjustment to isolate intrinsic physical inconsistency from temperature confounding before applying hierarchical constraints progressively.

If this is right

Hierarchical constraints applied progressively outperform a single static regularization term across the generation process.
Fourier Neural Operators capture domain-agnostic physical patterns more effectively when target-domain examples are withheld from pretraining.
Backdoor adjustment at each constraint level isolates genuine physical violations from spurious temperature effects.
The method delivers RMSE of 0.215 on the battery temperature extrapolation task versus 0.397 for the unconstrained baseline.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same exclusion strategy during pretraining could apply to other generative models facing distribution shifts driven by measurable external variables.
Deconfounding at multiple hierarchy levels might prove useful in non-battery physics domains where similar confounding structures appear.
Testing the gate on tasks without an obvious single confounder would clarify how much the temperature-specific adjustment contributes to the overall gain.

Load-bearing premise

Temperature is the main confounder that can be removed via do-operator and backdoor adjustment without introducing new inconsistencies into the enforcement of physical laws.

What would settle it

An experiment that includes target-domain temperature data in pretraining and obtains equal or lower extrapolation RMSE than the version that excludes it would contradict the reported pretraining benefit.

Figures

Figures reproduced from arXiv: 2605.07485 by Tsuyoshi Okita.

**Figure 1.** Figure 1: Overview of the HPC-FNO-CFM framework. Left: FNO(1) pretrained on multi-condition battery data (Stage 1) using spectral convolution to learn condition-dependent physical patterns; parameters are frozen after Stage 1. Center: Condition-conditioned CFM generation network (Stage 2). The frozen FNO(1) provides physical guidance to the CFM velocity field through an Integration Layer, analogous to PDE guidance a… view at source ↗

read the original abstract

Extrapolation to out-of-distribution conditions is a fundamental challenge for physics-constrained deep generative models. Existing methods apply physical constraints as a single static regularization term uniformly across the generation process, and address neither the hierarchical structure of physical laws and the confounding variable problem. We propose the Deconfounded Hierarchical Gate (DHG), which serves as a diagnostic and control mechanism: it identifies when and how strongly temperature confounding contaminates each constraint level, so that hierarchical gates reflect intrinsic physical inconsistency rather than spurious temperature effects. DHG combines counterfactual estimation via the do-operator with backdoor adjustment to remove confounding, then applies Coarse-to-Fine physical constraints progressively. We report a counter-intuitive finding in pretraining: excluding the target-domain data from pretraining outperforms including it by 39% in extrapolation performance (RMSE 0.224 vs. 0.324). This occurs because FNO learns domain-agnostic physical patterns that transfer more effectively when the target domain is withheld. On a lithium-ion battery temperature extrapolation benchmark (trained at 24 degrees Celsius, evaluated at 4.0--43.0 degrees Celsius), our method achieves RMSE = 0.215, a 46% improvement over the unconstrained baseline (Pure CFM: 0.397).

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Excluding target-domain data from pretraining gives a 39% extrapolation boost on this battery task, but the causal deconfounding claims rest on unshown assumptions about the adjustment set.

read the letter

The main takeaway is that withholding target-domain data during pretraining improves extrapolation RMSE from 0.324 to 0.224 on their lithium-ion battery benchmark, and the proposed Deconfounded Hierarchical Gate tries to make physics constraints apply level-by-level after removing temperature confounding. That pretraining observation is the clearest new piece. The DHG combines do-operator counterfactuals with backdoor adjustment to flag when temperature is contaminating each constraint, then enforces coarse-to-fine physical laws progressively instead of a single static penalty. On the reported numbers, it reaches 0.215 RMSE versus 0.397 for the unconstrained baseline, a 46% gain. The setup targets a practical engineering shift (train at 24C, test across 4-43C), which is more grounded than many abstract claims in this area. The hierarchical gate idea and the pretraining finding both address real gaps in existing physics-constrained generative models. The soft spots are the missing pieces. The abstract states the method components but shows no equations, no causal graph, no ablation on the adjustment step, and no check that temperature is the only or dominant confounder. If current, SOC, or voltage open other backdoor paths, the adjustment could leave residual bias or introduce new artifacts, and the reported gains might not trace to the deconfounding. Without those details it is hard to separate the contribution of the hierarchical enforcement from the pretraining trick or from ordinary regularization. This is for people working on physics-informed models for scientific or engineering data where distribution shift matters. A reader already familiar with causal inference in ML and with FNO-style architectures would get the most out of the pretraining result and the diagnostic gate concept, provided the full paper supplies the derivations and controls. It deserves peer review. The benchmark is concrete and the claims are specific enough that referees can test the causal assumptions and request the missing ablations.

Referee Report

3 major / 2 minor

Summary. The paper proposes the Deconfounded Hierarchical Gate (DHG) for physics-constrained deep generative models. DHG uses the do-operator and backdoor adjustment to identify and remove temperature confounding at each level of a hierarchy of physical constraints, enabling better extrapolation. A key empirical claim is that excluding target-domain data from pretraining improves extrapolation performance by 39% (RMSE 0.224 vs. 0.324). On a lithium-ion battery temperature extrapolation benchmark (train at 24°C, evaluate at 4–43°C), DHG achieves RMSE 0.215, a 46% improvement over the unconstrained Pure CFM baseline (0.397).

Significance. If the causal graph is correctly specified and the reported gains are robust to alternative adjustment sets and data splits, the work could meaningfully advance physics-informed generative modeling by separating intrinsic physical violations from confounding effects. The counter-intuitive pretraining result, if reproducible, would also challenge standard practice in domain-adaptive scientific ML.

major comments (3)

[Abstract] Abstract: the 39% and 46% RMSE gains are stated without error bars, statistical tests, ablation tables, or any description of how the causal graph or adjustment set for temperature was chosen and validated. Because the entire deconfounding claim rests on backdoor adjustment being valid, the absence of this evidence is load-bearing for the central performance claims.
[Method] Method description: backdoor adjustment is invoked to isolate temperature confounding at each hierarchical constraint level, yet no causal graph, no list of observed covariates (current, SOC, voltage, etc.), and no sensitivity check to alternative graphs are provided. If the graph is misspecified, the adjustment can leave residual confounding or introduce new bias, directly undermining the assertion that the gates reflect 'intrinsic physical inconsistency rather than spurious temperature effects.'
[Results] Results section: the claim that 'FNO learns domain-agnostic physical patterns' when target data are withheld is presented as an explanation for the 39% gain, but no supporting analysis (e.g., feature visualizations, domain-invariance metrics, or controlled ablations that isolate the exclusion effect from the DHG component) is referenced.

minor comments (2)

Notation for the hierarchical gates and the progressive Coarse-to-Fine loss terms should be introduced with explicit equations rather than high-level prose.
[Abstract] The abstract would be clearer if it briefly named the other baselines beyond 'Pure CFM' and stated the number of random seeds or cross-validation folds used for the reported RMSE values.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for their constructive comments, which have helped clarify several aspects of our work. We address each major comment below and will revise the manuscript accordingly.

read point-by-point responses

Referee: [Abstract] Abstract: the 39% and 46% RMSE gains are stated without error bars, statistical tests, ablation tables, or any description of how the causal graph or adjustment set for temperature was chosen and validated. Because the entire deconfounding claim rests on backdoor adjustment being valid, the absence of this evidence is load-bearing for the central performance claims.

Authors: We agree that the abstract would be strengthened by including statistical context for the reported gains. In the revised manuscript, we will add error bars to the RMSE figures and reference the statistical tests from the results section. We will also include a brief note on the adjustment set (current, SOC, voltage) selected from domain knowledge of battery thermal dynamics, with full details and ablations directed to the method and supplementary sections. revision: yes
Referee: [Method] Method description: backdoor adjustment is invoked to isolate temperature confounding at each hierarchical constraint level, yet no causal graph, no list of observed covariates (current, SOC, voltage, etc.), and no sensitivity check to alternative graphs are provided. If the graph is misspecified, the adjustment can leave residual confounding or introduce new bias, directly undermining the assertion that the gates reflect 'intrinsic physical inconsistency rather than spurious temperature effects.'

Authors: We thank the referee for this observation. The method section describes the do-operator and backdoor adjustment but lacks an explicit causal graph and covariate list. We will add a figure showing the causal graph with temperature as confounder and observed variables (current, SOC, voltage). A sensitivity analysis to alternative adjustment sets will be added to the supplementary material to demonstrate robustness and confirm that the gates primarily capture intrinsic physical inconsistencies. revision: yes
Referee: [Results] Results section: the claim that 'FNO learns domain-agnostic physical patterns' when target data are withheld is presented as an explanation for the 39% gain, but no supporting analysis (e.g., feature visualizations, domain-invariance metrics, or controlled ablations that isolate the exclusion effect from the DHG component) is referenced.

Authors: We acknowledge that additional analysis would better support the explanation for the pretraining result. In the revised results section, we will include feature visualizations of FNO representations and domain-invariance metrics (e.g., MMD) comparing pretraining regimes. Controlled ablations isolating the data-exclusion effect from DHG will also be added to substantiate the claim that withholding target data enables better domain-agnostic pattern learning. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical claims rest on benchmark comparisons, not self-referential derivations

full rationale

The provided text (abstract and description) introduces DHG as a combination of standard causal tools (do-operator, backdoor adjustment) with hierarchical constraints and reports numerical improvements on a lithium-ion battery extrapolation task. No equations, fitted parameters renamed as predictions, or self-citations are visible that would reduce any claimed result to its own inputs by construction. The counter-intuitive pretraining finding is stated as an observed outcome rather than a derived tautology, and the method description relies on external causal inference concepts without load-bearing self-references or ansatzes smuggled via prior author work. The derivation chain is therefore self-contained against the reported external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review; no explicit free parameters, axioms, or invented entities are detailed beyond the high-level description of DHG and causal operators.

pith-pipeline@v0.9.0 · 5521 in / 1086 out tokens · 32079 ms · 2026-05-11T01:47:32.853207+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Foundation/AbsoluteFloorClosure.lean reality_from_one_distinction unclear
DHG combines counterfactual estimation via the do-operator with backdoor adjustment to remove confounding, then applies Coarse-to-Fine physical constraints progressively.
IndisputableMonolith/Foundation/BranchSelection.lean branch_selection unclear
We report a counter-intuitive finding in pretraining: excluding the target-domain data from pretraining outperforms including it by 39% in extrapolation performance

Reference graph

Works this paper leans on

34 extracted references · 34 canonical work pages · 3 internal anchors

[1]

Shortest-path ﬂow matching with mixture-conditioned bases for OOD generalization

Alejandro Almod´ ovar et al. Shortest-path ﬂow matching with mixture-conditioned bases for OOD generalization. arXiv preprint arXiv:2601.11827 , 2026

work page arXiv 2026
[2]

De- CaFlow: A deconfounding causal generative model

Alejandro Almod´ ovar, Adri´ an Javaloy, Juan Parras, Santiago Zazo, and Isabel V alera. De- CaFlow: A deconfounding causal generative model. In Advances in Neural Information Pro- cessing Systems (NeurIPS) , 2025

work page 2025
[3]

Bartlett and Shahar Mendelson

Peter L. Bartlett and Shahar Mendelson. Rademacher and g aussian complexities: Risk bounds and structural results. Journal of Machine Learning Research , 3:463–482, 2002

work page 2002
[4]

Kochman n

Jan-Hendrik Bastek, WaiChing Sun, and Dennis M. Kochman n. Physics-informed diffusion models. In Proceedings of the 12th International Conference on Learni ng Representations (ICLR), 2024

work page 2024
[5]

Dynaformer: A deep learning model for ageing-aware battery discharge pr ediction

Luca Biggio, Tommaso Bendinelli, Chetan Kulkarni, and O lga Fink. Dynaformer: A deep learning model for ageing-aware battery discharge pr ediction. arXiv preprint arXiv:2206.02555, 2022

work page arXiv 2022
[6]

Universal approximation to nonlinear operators by neural networks with arbitrary activation functions and its appli cation to dynamical systems

Tianping Chen and Hong Chen. Universal approximation to nonlinear operators by neural networks with arbitrary activation functions and its appli cation to dynamical systems. IEEE Transactions on Neural Networks, 6(4):911–917, 1995

work page 1995
[7]

A simple framework for contrastive learning of visual representations

Ting Chen, Simon Kornblith, Mohammad Norouzi, and Geoff rey Hinton. A simple framework for contrastive learning of visual representations. In Proceedings of the 37th International Conference on Machine Learning (ICML) , pages 1597–1607, 2020

work page 2020
[8]

Coddington and Norman Levinson

Earl A. Coddington and Norman Levinson. Theory of Ordinary Differential Equations . McGraw-Hill, 1955

work page 1955
[9]

V ariational physics-informed n eural operator (VINO) for solv- ing partial differential equations

Mehmet Serhat Eshaghi, Cosmin Anitescu, Manish Thombre , Yizheng Wang, Xiaoying Zhuang, and Timon Rabczuk. V ariational physics-informed n eural operator (VINO) for solv- ing partial differential equations. Computer Methods in Applied Mechanics and Engineering , 437:117785, 2025

work page 2025
[10]

Lawrence C. Evans. Partial Differential Equations . American Mathematical Society, 2nd edition, 2010

work page 2010
[11]

Acceler- ated battery life testing dataset

Kai Fricke, Rafael Nascimento, Marco Corbetta, Chetan Kulkarni, and Felipe Viana. Acceler- ated battery life testing dataset. NASA Prognostics Data Re pository, 2023

work page 2023
[12]

Gronwall

Thomas H. Gronwall. Note on the derivatives with respec t to a parameter of the solutions of a system of differential equations. Annals of Mathematics , 20(4):292–296, 1919

work page 1919
[13]

GANs trained by a two time-scale update rule converge t o a local Nash equilibrium

Martin Heusel, Hubert Ramsauer, Thomas Unterthiner, B ernhard Nessler, and Sepp Hochre- iter. GANs trained by a two time-scale update rule converge t o a local Nash equilibrium. In Advances in Neural Information Processing Systems , volume 30, 2017

work page 2017
[14]

Rusu, Kieran Milan, John Quan, Tiago Ramalho, Agni eszka Grabska-Barwinska, Demis Hassabis, Claudia Clopath, Dharshan Kumaran, and Rai a Hadsell

James Kirkpatrick, Razvan Pascanu, Neil Rabinowitz, J oel V eness, Guillaume Desjardins, Andrei A. Rusu, Kieran Milan, John Quan, Tiago Ramalho, Agni eszka Grabska-Barwinska, Demis Hassabis, Claudia Clopath, Dharshan Kumaran, and Rai a Hadsell. Overcoming catas- trophic forgetting in neural networks. Proceedings of the National Academy of Sciences , 114(1...

work page 2017
[15]

Fourier Neural Operator for Parametric Partial Differential Equations

Zongyi Li, Nikola Kovachki, Kamyar Azizzadenesheli, B urigede Liu, Kaushik Bhattacharya, Andrew Stuart, and Anima Anandkumar. Fourier neural operat or for parametric partial differ- ential equations. arXiv preprint arXiv:2010.08895 , 2020

work page internal anchor Pith review Pith/arXiv arXiv 2010
[16]

Neural Operator: Graph Kernel Network for Partial Differential Equations

Zongyi Li, Nikola Kovachki, Kamyar Azizzadenesheli, B urigede Liu, Kaushik Bhattacharya, Andrew Stuart, and Anima Anandkumar. Neural operator: Grap h kernel network for partial differential equations. arXiv preprint arXiv:2003.03485 , 2020

work page internal anchor Pith review arXiv 2003
[17]

Physics-infor med neural operator for learning partial differential equations

Zongyi Li, Hongkai Zheng, Nikola Kovachki, David Jin, H aoxuan Chen, Burigede Liu, Kam- yar Azizzadenesheli, and Anima Anandkumar. Physics-infor med neural operator for learning partial differential equations. ACM/IMS Journal of Data Science , 1(3):1–27, 2024

work page 2024
[18]

Flow Matching for Generative Modeling

Y aron Lipman, Ricky T.Q. Chen, Heli Ben-Hamu, Maximili an Nickel, and Matt Le. Flow matching for generative modeling. arXiv preprint arXiv:2210.02747 , 2022. 10

work page internal anchor Pith review Pith/arXiv arXiv 2022
[19]

Learning nonlinear operators via DeepONet based on the universal app roximation theorem of operators

Lu Lu, Pengzhan Jin, Guofei Pang, Zhongqiang Zhang, and George Em Karniadakis. Learning nonlinear operators via DeepONet based on the universal app roximation theorem of operators. Nature Machine Intelligence, 3:218–229, 2021

work page 2021
[20]

About the constants in talagrand’s con centration inequalities for empirical processes

Pascal Massart. About the constants in talagrand’s con centration inequalities for empirical processes. The Annals of Probability , 28(2):863–884, 2000

work page 2000
[21]

Michael McCloskey and Neal J. Cohen. Catastrophic inte rference in connectionist networks: The sequential learning problem. In Psychology of Learning and Motivation, volume 24, pages 109–165. Academic Press, 1989

work page 1989
[22]

F oundations of Machine Learn- ing

Mehryar Mohri, Afshin Rostamizadeh, and Ameet Talwalk ar. F oundations of Machine Learn- ing. MIT Press, 2nd edition, 2018

work page 2018
[23]

Causality: Models, Reasoning, and Inference

Judea Pearl. Causality: Models, Reasoning, and Inference . Cambridge University Press, 2000

work page 2000
[24]

Causal inference by using invariant prediction: identiﬁcation and conﬁdence intervals

Jonas Peters, Peter B¨ uhlmann, and Nicolai Meinshause n. Causal inference by using invariant prediction: identiﬁcation and conﬁdence intervals. Journal of the Royal Statistical Society: Series B, 78(5):947–1012, 2016

work page 2016
[25]

M´ emoire sur la th´ eorie des ´ equations diff´erentielles

´Emile Picard. M´ emoire sur la th´ eorie des ´ equations diff´erentielles. Journal de Math ´ematiques Pures et Appliqu ´ees, 6:145–210, 1890

work page
[26]

Machine learning pipeline for battery state-of-health estimation

Diego Roman, Saurabh Saxena, V alentin Robu, Michael Pe cht, and David Flynn. Machine learning pipeline for battery state-of-health estimation. Nature Machine Intelligence, 3(5):447– 456, 2021

work page 2021
[27]

Battery data set

Bhaskar Saha and Kai Goebel. Battery data set. In NASA AMES Prognostics Data Repository, 2008

work page 2008
[28]

Edward H. Simpson. The interpretation of interaction i n contingency tables. Journal of the Royal Statistical Society, Series B , 13(2):238–241, 1951

work page 1951
[29]

Physics-inte grated variational autoencoders for ro- bust and interpretable generative modeling

Naoya Takeishi and Alexandros Kalousis. Physics-inte grated variational autoencoders for ro- bust and interpretable generative modeling. In Advances in Neural Information Processing Systems (NeurIPS), 2021

work page 2021
[30]

BatteryLife: A comprehensive dataset and benchmark for battery life prediction

Ruifeng Tan, Jiayuan Hong, Kai Wang, Jia Zhang, Jia Li, e t al. BatteryLife: A comprehensive dataset and benchmark for battery life prediction. In Proceedings of the 31st ACM SIGKDD Conference on Knowledge Discovery and Data Mining , 2025. arXiv:2502.18807

work page arXiv 2025
[31]

Wavelet neural o perator for solving parametric partial differential equations in computational mechanics proble ms

Tapas Tripura and Souvik Chakraborty. Wavelet neural o perator for solving parametric partial differential equations in computational mechanics proble ms. Computer Methods in Applied Mechanics and Engineering , 404:115783, 2023

work page 2023
[32]

Resp ecting causality for training physics- informed neural networks

Sifan Wang, Shyam Sankaran, and Paris Perdikaris. Resp ecting causality for training physics- informed neural networks. Computer Methods in Applied Mechanics and Engineering , 421:116813, 2022

work page 2022
[33]

Gege Wen, Zongyi Li, Kamyar Azizzadenesheli, Anima Ana ndkumar, and Sally M. Benson. U-FNO: An enhanced Fourier neural operator-based deep-lea rning model for multiphase ﬂow. Advances in W ater Resources, 163:104180, 2022

work page 2022
[34]

this waveform looks de- graded because the battery is old

Chenxi Zhu, Xiao Xu, Jiawei Han, and Jintai Chen. Physic s-informed temporal alignment for auto-regressive PDE foundation models. In Proceedings of the 42nd International Conference on Machine Learning (ICML) , 2025. A Temperature Confounding in NASA Discharge Waveforms This appendix provides background for ML readers unfamilia r with battery electrochemi...

work page 2025