pith. sign in

arxiv: 2509.24517 · v2 · pith:BOCTSUBLnew · submitted 2025-09-29 · 💻 cs.LG

Physics Priors Offer Useful Accuracy-Carbon Trade-Offs in Spatio-Temporal Forecasting

Pith reviewed 2026-05-22 12:57 UTC · model grok-4.3

classification 💻 cs.LG
keywords physics priorsspatio-temporal forecastingcarbon footprintmodel efficiencyinductive biasesNavier-Stokesincompressible shear flowdeep learning
0
0 comments X

The pith

Models with stronger physics priors achieve substantially lower training carbon footprints for spatio-temporal forecasting but the advantage does not extend straightforwardly to inference.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper compares neural network models equipped with strong, weak, or no physics inductive biases on the task of forecasting incompressible shear flow, which obeys the Navier-Stokes equations. Stronger priors deliver clear reductions in training energy use and carbon emissions while preserving accuracy. These gains do not appear reliably during inference, so the overall carbon picture depends on evaluating both stages. Readers care because large-scale machine learning models now carry significant energy costs across their entire lifecycle, and the work shows one practical route to lowering those costs without sacrificing performance.

Core claim

Models with stronger physics priors achieve substantially lower training footprints, but this advantage does not straightforwardly extend to inference, highlighting the importance of evaluating carbon costs across the full model lifecycle rather than any single stage.

What carries the argument

Physics inductive biases of varying strength inserted into neural networks for predicting flows governed by the Navier-Stokes equations.

If this is right

  • Carbon costs must be measured across training and inference rather than at one stage alone.
  • Model efficiency should stand as a core design goal alongside accuracy in machine learning development.
  • Physics priors can deliver useful accuracy-carbon trade-offs on tasks whose governing equations are known.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same prior-strength approach could be tested on larger or higher-dimensional flow datasets to check whether training savings scale.
  • When training compute is the dominant cost, practitioners might prefer strong-physics models even if inference costs remain comparable.
  • Quantifying carbon across data preparation and deployment stages would complete the lifecycle picture the paper begins.

Load-bearing premise

The incompressible shear flow task is representative enough of other spatio-temporal forecasting problems that the measured accuracy-carbon trade-offs will hold when the same physics-prior approach is applied elsewhere.

What would settle it

Running the identical comparison on a different governed forecasting problem such as weather or ocean currents and finding no training-footprint reduction for the strong-prior models would falsify the central trade-off claim.

Figures

Figures reproduced from arXiv: 2509.24517 by Jens Hesselbjerg Christensen, Raghavendra Selvan, Sophia N. Wilson.

Figure 1
Figure 1. Figure 1: a: Conceptual map of modelling approaches by data availability (vertical axis) and domain knowledge (horizontal axis). Physics-informed ML lies in the middle, combining data-driven flexibility with physics priors. b: Test MSE versus CO2eq emissions for a data-driven NN (red squares), an unsupervised PINN (blue dots), and a semi-supervised PINN (grey triangles). The dashed line marks the Pareto front, domin… view at source ↗
Figure 2
Figure 2. Figure 2: Modelling the harmonic oscilla￾tor. Training and extrapolation predictions when using NNs with different activation functions: ReLU, Tanh, Sine, Snake, and using a PINN. Revisiting the Bias-Variance Trade￾off. It is commonly understood in ML that strong model assumptions (bias) risk under-fitting, whereas high sensitivity to training data can lead to over-fitting (vari￾ance) (Kohavi et al., 1996). Striking… view at source ↗
Figure 3
Figure 3. Figure 3: Spectrum of model families from purely data-driven ML to physics-based solvers. U-net. U-nets (Ronneberger et al., 2015) can be used for spatio-temporal fore￾casting by mapping a window of past states to future ones. They capture local correla￾tions by fusing features across multiple spa￾tial scales, but do not explicitly constrain the dynamics beyond what is enforced by the training data. Mild forms of in… view at source ↗
Figure 4
Figure 4. Figure 4: Target and training data for viscous Burgers’ equation. a: Target data. b: Super￾vised samples. c: PDE residuals samples. d: Initial and boundary condition points. In this experiment, we study models to ap￾proximate the solution of the viscous Burg￾ers’ equation with Dirichlet boundary con￾ditions on a 2D spatio-temporal grid: ∂u ∂t + u ∂u ∂x − ν ∂ 2u ∂x2 = 0, u(x, 0) = − sin(πx), u(1, t) = u(−1, t) = 0. f… view at source ↗
Figure 5
Figure 5. Figure 5: Damped harmonic oscilla￾tor predictions. Mean predictions (solid lines) with standard deviation (shaded areas) from 10 runs. Results are shown for the two best-performing models: Snake and PINN. The results in [PITH_FULL_IMAGE:figures/full_fig_p008_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Pearson r over 20 rollout steps. Values are averaged over the four fields: velocity components, tracer, and pres￾sure. Models labels are ordered from no bias (top) to strong bias (bottom) [PITH_FULL_IMAGE:figures/full_fig_p009_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Example rollout predictions for one of the four fields (tracer). Target trajectory [PITH_FULL_IMAGE:figures/full_fig_p010_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: Predictive performance and carbon footprint. [PITH_FULL_IMAGE:figures/full_fig_p011_8.png] view at source ↗
Figure 9
Figure 9. Figure 9: Example of incompressible shear flow data. Temporal evolution of tracer (s), [PITH_FULL_IMAGE:figures/full_fig_p020_9.png] view at source ↗
Figure 10
Figure 10. Figure 10: Predictive performance and rollout carbon footprint. [PITH_FULL_IMAGE:figures/full_fig_p022_10.png] view at source ↗
Figure 11
Figure 11. Figure 11: Example of rollout predictions for pressure field. Target trajectory (top row) [PITH_FULL_IMAGE:figures/full_fig_p023_11.png] view at source ↗
Figure 12
Figure 12. Figure 12: Example of rollout predictions for horizontal velocity field. Target trajectory [PITH_FULL_IMAGE:figures/full_fig_p023_12.png] view at source ↗
Figure 13
Figure 13. Figure 13: Example of rollout predictions for vertical velocity field. Target trajectory (top [PITH_FULL_IMAGE:figures/full_fig_p024_13.png] view at source ↗
read the original abstract

Development of modern deep learning methods has been driven primarily by the push for improving model efficacy (accuracy metrics). This sole focus on efficacy has steered development of large-scale models that require massive computational resources, and results in considerable energy consumption and corresponding carbon footprint across the model lifecycle. In this work, we explore how physics inductive biases can offer useful trade-offs between model efficacy and model efficiency (compute, energy, and carbon). We study models with strong, weak, and no physics-inductive biases for spatio-temporal forecasting of incompressible shear flow, a task governed by the Navier-Stokes equations. We find that models with stronger physics priors achieve substantially lower training footprints, but this advantage does not straightforwardly extend to inference, highlighting the importance of evaluating carbon costs across the full model lifecycle rather than any single stage. We argue that model efficiency, along with model efficacy, should become a core consideration driving machine learning model development and deployment.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper examines accuracy-carbon trade-offs in spatio-temporal forecasting by comparing deep learning models with strong, weak, and no physics inductive biases on the task of predicting incompressible shear flow governed by the Navier-Stokes equations. It reports that stronger physics priors yield substantially lower training carbon footprints, though this advantage does not extend straightforwardly to inference, and concludes that model efficiency should be considered alongside efficacy in ML development.

Significance. If the empirical findings hold under broader testing, the work would usefully highlight lifecycle carbon costs in scientific ML and demonstrate that physics priors can reduce training footprints without sacrificing accuracy on this flow problem. The emphasis on evaluating both training and inference stages is a constructive contribution to green AI discussions, though the single-task scope limits immediate generalizability.

major comments (2)
  1. [Experimental setup and results] The central claim that stronger physics priors achieve substantially lower training footprints (and the associated accuracy-carbon trade-off) rests entirely on results from the incompressible shear flow task. No other spatio-temporal forecasting problems, governing equations, or datasets are evaluated, so it is unclear whether the reported training-footprint reductions are an artifact of this specific flow regime, boundary conditions, or data statistics rather than a general property of physics priors.
  2. [Abstract and §4 (Results)] The abstract and main claims state directional findings on training footprints without supplying quantitative values, error bars, baseline model details, dataset sizes, or hardware specifications. This absence makes it impossible to judge the magnitude of the carbon savings or to reproduce the comparison between strong/weak/no physics-prior variants.
minor comments (2)
  1. [Figures] Figure captions and axis labels should explicitly state the units and normalization used for carbon footprint (e.g., kg CO2e) and whether values are per epoch, per run, or total.
  2. [Introduction] The manuscript would benefit from a short related-work paragraph contrasting the physics-prior approach with other efficiency techniques such as pruning or quantization that have been applied to spatio-temporal models.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive and detailed feedback on our manuscript. We address each major comment below, clarifying our position and outlining revisions to improve the paper's clarity and scope.

read point-by-point responses
  1. Referee: [Experimental setup and results] The central claim that stronger physics priors achieve substantially lower training footprints (and the associated accuracy-carbon trade-off) rests entirely on results from the incompressible shear flow task. No other spatio-temporal forecasting problems, governing equations, or datasets are evaluated, so it is unclear whether the reported training-footprint reductions are an artifact of this specific flow regime, boundary conditions, or data statistics rather than a general property of physics priors.

    Authors: We agree that the study is confined to the incompressible shear flow task governed by the Navier-Stokes equations, which was selected as a canonical, well-characterized benchmark to isolate the effects of varying physics inductive biases under controlled conditions. While this focused setting enables rigorous comparison of strong, weak, and no-prior models, we acknowledge that broader validation across additional spatio-temporal problems would be needed to establish generality. In the revised manuscript we will expand the discussion and limitations sections to explicitly address the single-task scope, discuss why this flow regime is representative for studying physics priors, and outline concrete directions for future multi-task evaluations. revision: partial

  2. Referee: [Abstract and §4 (Results)] The abstract and main claims state directional findings on training footprints without supplying quantitative values, error bars, baseline model details, dataset sizes, or hardware specifications. This absence makes it impossible to judge the magnitude of the carbon savings or to reproduce the comparison between strong/weak/no physics-prior variants.

    Authors: We appreciate this observation and agree that the abstract and results presentation would benefit from greater specificity. In the revised manuscript we will augment the abstract with concise quantitative highlights (e.g., relative training-carbon reductions and associated accuracy metrics) and ensure Section 4 includes error bars, baseline architecture details, dataset sizes, and hardware specifications to support reproducibility and magnitude assessment. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical comparison of physics-prior models on measured carbon and accuracy

full rationale

The paper conducts an experimental study comparing model variants with strong, weak, and no physics priors on the incompressible shear flow task. Reported accuracy-carbon trade-offs are obtained by direct measurement of training and inference footprints on the chosen dataset; no equations, derivations, or fitted parameters are defined in terms of the target quantities. No self-citation chains, ansatzes, or uniqueness theorems are invoked to justify the central results. The work is therefore self-contained against external benchmarks and receives the default non-circularity finding.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

Only the abstract is available, so the ledger is limited to the domain assumption stated in the abstract; no free parameters or invented entities are described.

axioms (1)
  • domain assumption The forecasting task is governed by the Navier-Stokes equations for incompressible shear flow.
    This premise justifies the use of physics inductive biases and is invoked to motivate the model variants.

pith-pipeline@v0.9.0 · 5696 in / 1225 out tokens · 56122 ms · 2026-05-22T12:57:07.901858+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

46 extracted references · 46 canonical work pages · 3 internal anchors

  1. [1]

    @esa (Ref

    \@ifxundefined[1] #1\@undefined \@firstoftwo \@secondoftwo \@ifnum[1] #1 \@firstoftwo \@secondoftwo \@ifx[1] #1 \@firstoftwo \@secondoftwo [2] @ #1 \@temptokena #2 #1 @ \@temptokena \@ifclassloaded agu2001 natbib The agu2001 class already includes natbib coding, so you should not add it explicitly Type <Return> for now, but then later remove the command n...

  2. [2]

    \@lbibitem[] @bibitem@first@sw\@secondoftwo \@lbibitem[#1]#2 \@extra@b@citeb \@ifundefined br@#2\@extra@b@citeb \@namedef br@#2 \@nameuse br@#2\@extra@b@citeb \@ifundefined b@#2\@extra@b@citeb @num @parse #2 @tmp #1 NAT@b@open@#2 NAT@b@shut@#2 \@ifnum @merge>\@ne @bibitem@first@sw \@firstoftwo \@ifundefined NAT@b*@#2 \@firstoftwo @num @NAT@ctr \@secondoft...

  3. [3]

    @open @close @open @close and [1] URL: #1 \@ifundefined chapter * \@mkboth \@ifxundefined @sectionbib * \@mkboth * \@mkboth\@gobbletwo \@ifclassloaded amsart * \@ifclassloaded amsbook * \@ifxundefined @heading @heading NAT@ctr thebibliography [1] @ \@biblabel @NAT@ctr \@bibsetup #1 @NAT@ctr @ @openbib .11em \@plus.33em \@minus.07em 4000 4000 `\.\@m @bibit...

  4. [4]

    L. F. W. Anthony, B. Kanding, and R. Selvan. Carbontracker: Tracking and predicting the carbon footprint of training deep learning models, 2020. URL https://arxiv.org/abs/2007.03051

  5. [5]

    URL http://dx.doi.org/10.1109/ICASSP48485.2024.10447579

    P. Bakhtiarifard, C. Igel, and R. Selvan. Ec-nas: Energy consumption aware tabular benchmarks for neural architecture search. In ICASSP 2024 - 2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), page 5660–5664. IEEE, Apr. 2024. doi:10.1109/icassp48485.2024.10448303. URL http://dx.doi.org/10.1109/ICASSP48485.2024.10448303

  6. [6]

    Baldan, Q

    G. Baldan, Q. Liu, A. Guardone, and N. Thuerey. Flow matching meets pdes: A unified framework for physics-constrained generation, 2025. URL https://arxiv.org/abs/2506.08604

  7. [7]

    B. R. Bartoldson, B. Kailkhura, and D. Blalock. Compute-efficient deep learning: Algorithmic trends and opportunities. Journal of Machine Learning Research, 24 0 (122): 0 1--77, 2023

  8. [8]

    Brehmer, S

    J. Brehmer, S. Behrends, P. de Haan, and T. Cohen. Does equivariance matter at scale?, 2024. URL https://arxiv.org/abs/2410.23179

  9. [9]

    K. J. Burns, G. M. Vasil, J. S. Oishi, D. Lecoanet, and B. P. Brown. Dedalus: A flexible framework for numerical simulations with spectral methods. Physical Review Research, 2 0 (2): 0 023068, 2020. doi:10.1103/PhysRevResearch.2.023068

  10. [10]

    Cottier, R

    B. Cottier, R. Rahman, L. Fattorini, N. Maslej, T. Besiroglu, and D. Owen. The rising costs of training frontier ai models. arXiv preprint arXiv:2405.21015, 2024

  11. [11]

    Q. Dao, H. Phung, B. Nguyen, and A. Tran. Flow matching in latent space. arXiv preprint arXiv:2307.08698, 2023

  12. [12]

    N. A. Disch, Y. Kirchhoff, R. Peretzke, M. Rokuss, S. Roy, C. Ulrich, D. Zimmerer, and K. Maier-Hein. Temporal flow matching for learning spatio-temporal trajectories in 4d longitudinal medical imaging, 2025. URL https://arxiv.org/abs/2508.21580

  13. [13]

    Dutta, N

    S. Dutta, N. Innan, S. B. Yahia, and M. Shafique. AQ - PINNs : Attention-enhanced quantum physics-informed neural networks for carbon-efficient climate modeling. URL http://arxiv.org/abs/2409.01626

  14. [14]

    Evchenko, J

    M. Evchenko, J. Vanschoren, H. H. Hoos, M. Schoenauer, and M. Sebag. Frugal machine learning. arXiv preprint arXiv:2111.03731, 2021

  15. [15]

    Henderson, J

    P. Henderson, J. Hu, J. Romoff, E. Brunskill, D. Jurafsky, and J. Pineau. Towards the systematic reporting of the energy and carbon footprints of machine learning, 2022. URL https://arxiv.org/abs/2002.05651

  16. [16]

    Electricity 2025

    IEA . Electricity 2025 . https://www.iea.org/reports/electricity-2025, 2025. Accessed: 2025-09-20

  17. [17]

    Jahani-nasab and M

    M. Jahani-nasab and M. A. Bijarchi. Enhancing convergence speed with feature enforcing physics-informed neural networks using boundary conditions as prior knowledge. Scientific Reports, 14: 0 23836, 2024. doi:10.1038/s41598-024-74711-y. URL https://doi.org/10.1038/s41598-024-74711-y

  18. [18]

    Kapoor, A

    T. Kapoor, A. Chandra, A. Stamou, and S. J. Roberts. Beyond accuracy: Ecol2 metric for sustainable neural pde solvers. arXiv preprint arXiv:2505.12556, 2025

  19. [19]

    N. Khoa. Tutorials for physics-informed neural networks (pinns), 2022. URL https://github.com/nguyenkhoa0209/pinns_tutorial. Accessed: 2024-07-12

  20. [20]

    D. P. Kingma and J. Ba. Adam: A method for stochastic optimization. In ICLR (Poster), 2015. URL http://arxiv.org/abs/1412.6980

  21. [21]

    Kohavi, D

    R. Kohavi, D. H. Wolpert, et al. Bias plus variance decomposition for zero-one loss functions. In Proceedings of the International Conference on Machine Learning (ICML), volume 96, pages 275--283, 1996

  22. [22]

    Kossaifi, N

    J. Kossaifi, N. Kovachki, K. Azizzadenesheli, and A. Anandkumar. Multi-grid tensorized fourier neural operator for high-resolution pdes, 2023. URL https://arxiv.org/abs/2310.00120

  23. [23]

    N. B. Kovachki, Z. Li, K. Azizzadenesheli, K. Bhattacharya, A. M. Stuart, and A. Anandkumar. Neural operator: Learning maps between function spaces. Journal of Machine Learning Research, 24 0 (89): 0 1--97, 2023

  24. [24]

    R. Lam, A. Sanchez-Gonzalez, M. Willson, P. Wirnsberger, M. Fortunato, F. Alet, S. Ravuri, T. Ewalds, Z. Eaton-Rosen, W. Hu, A. Merose, S. Hoyer, G. Holland, O. Vinyals, J. Stott, A. Pritzel, S. Mohamed, and P. Battaglia. GraphCast : Learning skillful medium-range global weather forecasting, 2023. URL http://arxiv.org/abs/2212.12794

  25. [25]

    Z. Li, A. Zhou, and A. B. Farimani. Generative latent neural pde solver using flow matching, 2025. URL https://arxiv.org/abs/2503.22600

  26. [26]

    S. H. Lim, Y. Wang, A. Yu, E. Hart, M. W. Mahoney, X. S. Li, and N. B. Erichson. Elucidating the design choice of probability paths in flow matching for forecasting, 2025. URL https://arxiv.org/abs/2410.03229

  27. [27]

    Flow Matching for Generative Modeling

    Y. Lipman, R. T. Q. Chen, H. Ben-Hamu, M. Nickel, and M. Le. Flow matching for generative modeling, 2023. URL https://arxiv.org/abs/2210.02747. Published as a conference paper at ICLR 2023

  28. [28]

    D. C. Liu and J. Nocedal. On the limited memory BFGS method for large scale optimization. Mathematical Programming, 45: 0 503--528, 1989. doi:10.1007/BF01589116

  29. [29]

    Z. Liu, H. Mao, C.-Y. Wu, C. Feichtenhofer, T. Darrell, and S. Xie. A convnet for the 2020s. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 11976--11986, 2022. doi:10.1109/CVPR52688.2022.01167

  30. [30]

    Loshchilov and F

    I. Loshchilov and F. Hutter. Decoupled weight decay regularization. In International Conference on Learning Representations, 2019. URL https://openreview.net/forum?id=Bkg6RiCqY7

  31. [31]

    L. Lu, P. Jin, G. Pang, Z. Zhang, and G. E. Karniadakis. Learning nonlinear operators via deeponet based on the universal approximation theorem of operators. Nature Machine Intelligence, 3 0 (3): 0 218--229, 2021. doi:10.1038/s42256-021-00302-5. URL https://doi.org/10.1038/s42256-021-00302-5

  32. [32]

    A. S. Luccioni, S. Viguier, and A.-L. Ligozat. Estimating the carbon footprint of bloom, a 176b parameter language model. Journal of machine learning research, 24 0 (253): 0 1--15, 2023

  33. [33]

    G. Moro, L. Ruggazzi, and L. Valgimigli. Carburacy: Summarization models tuning and comparison in eco-sustainable regimes with a novel carbon-aware accuracy. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 37, pages 14417--14425, 2023. doi:10.1609/aaai.v37i12.26686

  34. [34]

    Ohana, M

    R. Ohana, M. McCabe, L. Meyer, R. Morel, F. J. Agocs, M. Beneitez, M. Berger, B. Burkhart, S. Dalziel, D. Fielding, et al. The well: A large-scale collection of diverse physics simulations for machine learning. Dataset, 2024. URL https://doi.org/10.17863/CAM.113689. Accepted at NeurIPS 2024 Datasets and Benchmarks

  35. [35]

    Bondar, and Abhijit Sen

    S. Patra, S. Panda, B. K. Parida, M. Arya, K. Jacobs, D. I. Bondar, and A. Sen. Physics informed kolmogorov-arnold neural networks for dynamical analysis via efficent-kan and wav-kan, 2024. URL https://arxiv.org/abs/2407.18373

  36. [36]

    A. F. Psaros, R. Pestourie, P. T. Rakich, S. G. Johnson, P. Dey, P. Das, J. Moucer, et al. Physics-enhanced deep surrogates for partial differential equations. Nature Machine Intelligence, 5: 0 1458--1468, 2023. doi:10.1038/s42256-023-00761-y

  37. [37]

    M. A. Rahman, Z. E. Ross, and K. Azizzadenesheli. U-no: U-shaped neural operators. Transactions on Machine Learning Research, 2023. URL https://openreview.net/forum?id=rE7Xhez8mE

  38. [38]

    Raissi, P

    M. Raissi, P. Perdikaris, and G. Karniadakis. Physics-informed neural networks: A deep learning framework for solving forward and inverse problems involving nonlinear partial differential equations. Journal of Computational Physics, 378: 0 686--707, 2019. ISSN 0021-9991. doi:https://doi.org/10.1016/j.jcp.2018.10.045. URL https://www.sciencedirect.com/scie...

  39. [39]

    U -Net: Convolutional Networks for Biomedical Image Segmentation,

    O. Ronneberger, P. Fischer, and T. Brox. U-net: Convolutional networks for biomedical image segmentation. In Medical Image Computing and Computer-Assisted Intervention -- MICCAI 2015, volume 9351 of LNCS, pages 234--241. Springer, 2015. doi:10.1007/978-3-319-24574-4_28

  40. [40]

    Liang, Y.-C

    J. Sevilla, L. Heim, A. Ho, T. Besiroglu, M. Hobbhahn, and P. Villalobos. Compute trends across three eras of machine learning. In 2022 International Joint Conference on Neural Networks (IJCNN), page 1–8. IEEE, July 2022. doi:10.1109/ijcnn55064.2022.9891914. URL http://dx.doi.org/10.1109/IJCNN55064.2022.9891914

  41. [41]

    Energy and Policy Considerations for Deep Learning in NLP

    E. Strubell, A. Ganesh, and A. McCallum. Energy and policy considerations for deep learning in NLP . In A. Korhonen, D. Traum, and L. M \`a rquez, editors, Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 3645--3650, Florence, Italy, July 2019. Association for Computational Linguistics. doi:10.18653/v1/P19-135...

  42. [42]

    Tan and Q

    M. Tan and Q. Le. Efficientnet: Rethinking model scaling for convolutional neural networks. In International conference on machine learning, pages 6105--6114. PMLR, 2019

  43. [43]

    Van Wynsberghe

    A. Van Wynsberghe. Sustainable AI : AI for sustainability and the sustainability of AI . 1 0 (3): 0 213--218, 2021. ISSN 2730-5953, 2730-5961. doi:10.1007/s43681-021-00043-6. URL https://link.springer.com/10.1007/s43681-021-00043-6

  44. [44]

    Y. Zhao, Y. Liu, B. Jiang, and T. Guo. Ce-nas: An end-to-end carbon-efficient neural architecture search framework. Advances in Neural Information Processing Systems, 37: 0 82673--82704, 2024

  45. [45]

    Y. D. Zhong, B. Dey, and A. Chakraborty. Benchmarking energy-conserving neural networks for learning dynamics from data. In A. Jadbabaie, J. Lygeros, G. J. Pappas, P. A. Parrilo, B. Recht, C. J. Tomlin, and M. N. Zeilinger, editors, Proceedings of the 3rd Conference on Learning for Dynamics and Control, volume 144 of Proceedings of Machine Learning Resear...

  46. [46]

    Ziyin, T

    L. Ziyin, T. Hartwig, and M. Ueda. Neural networks fail to learn periodic functions and how to fix it, 2020. URL https://arxiv.org/abs/2006.08195