pith. machine review for the scientific record. sign in

arxiv: 2605.09975 · v1 · submitted 2026-05-11 · 💻 cs.LG · math.OC

Recognition: no theorem link

Chebyshev Center-Based Direction Selection for Multi-Objective Optimization and Training PINNs

Dabeen Lee, Hoyeol Yoon, Nam Ho-Nguyen, Seoungbin Bae

Pith reviewed 2026-05-12 02:20 UTC · model grok-4.3

classification 💻 cs.LG math.OC
keywords physics-informed neural networksmulti-objective optimizationChebyshev centerdual conedirection selectionnonconvex optimizationPINN training
0
0 comments X

The pith

Selecting PINN update directions as the Chebyshev center in the dual cone unifies scale robustness and simultaneous descent under one geometric rule.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper formulates update-direction selection during PINN training as the problem of locating the Chebyshev center inside the dual cone formed by the loss gradients. This center is the normalized vector that maximizes its shortest distance to any facet of the cone. The single geometric choice produces scale invariance and guaranteed descent on every loss term without separate enforcement of those properties. The formulation reduces to an efficient lower-dimensional dual problem and supplies a convergence guarantee that holds even when the composite loss is nonconvex. It also supplies a common geometric language for relating and comparing earlier direction-selection techniques.

Core claim

The authors show that the update direction is recovered by solving for the Chebyshev center of the dual cone, defined as the normalized point that maximizes the minimum distance to the cone facets. This choice automatically recovers scale robustness and simultaneous descent on all objectives. The resulting program admits an efficient dual formulation in much lower dimension and carries a convergence guarantee for nonconvex multi-objective losses. On several PINN benchmark problems the method exhibits strong empirical performance.

What carries the argument

The Chebyshev center of the dual cone: the normalized direction inside the cone that maximizes the radius of the largest ball touching all facets, used to construct the parameter update.

If this is right

  • The direction is automatically scale-invariant and descends on every loss term at once.
  • The dual formulation reduces computational cost by working in a space whose dimension equals the number of loss terms rather than the model dimension.
  • Convergence is guaranteed for nonconvex losses without additional regularity assumptions.
  • Earlier direction-selection heuristics appear as special cases or approximations of the same geometric principle.
  • The approach supplies a uniform basis for comparing and extending related multi-objective methods.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same dual-cone construction could be applied directly to other multi-task learning settings that require balanced gradient steps.
  • Warm-starting the dual solver from the previous iteration might further reduce per-step cost during long training runs.
  • Extending the criterion to include local curvature information could produce faster practical convergence while preserving the geometric guarantees.
  • The formulation invites systematic comparison with Pareto-front methods to see which geometric rule better matches practitioner needs.

Load-bearing premise

Maximizing the minimum distance to the facets of the dual cone is the single geometric criterion that automatically produces all the practically useful properties without extra constraints.

What would settle it

An experiment in which the computed direction either fails to produce descent on at least one loss term or loses scale robustness after rescaling the losses would falsify the central claim.

Figures

Figures reproduced from arXiv: 2605.09975 by Dabeen Lee, Hoyeol Yoon, Nam Ho-Nguyen, Seoungbin Bae.

Figure 1
Figure 1. Figure 1: Geometric illustration of the proposed direction-selection rule, shown in the Euclidean [PITH_FULL_IMAGE:figures/full_fig_p005_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: A three-objective example illustrating geometric differences with some representative [PITH_FULL_IMAGE:figures/full_fig_p007_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Coefficient field reconstruction for the inverse heat problem. We visualize the ground-truth [PITH_FULL_IMAGE:figures/full_fig_p011_3.png] view at source ↗
read the original abstract

Physics-informed neural networks (PINNs) are a promising approach for solving partial differential equations (PDEs). Their training, however, is often difficult because multiple loss terms induced by PDE residuals and boundary or initial conditions must be optimized simultaneously. To address this difficulty, existing approaches often construct update directions by explicitly enforcing particular desirable properties, such as scale robustness and simultaneous descent. While effective in many cases, such property-by-property designs can make it unclear which conditions are essential, what geometric principle determines the selected update direction, and how different methods are structurally related. In this work, we formulate update-direction selection for PINN training as a Chebyshev-center problem in the dual cone. The proposed formulation selects a normalized direction that maximizes the minimum distance to the cone facets. The resulting formulation admits an efficient dual problem in a much lower-dimensional space and yields a convergence guarantee in the nonconvex setting. It also recovers the key desirable properties targeted by existing approaches without imposing them separately; rather, they follow from the single geometric criterion underlying the formulation. This makes the selected direction interpretable through a single geometric rule and provides a unified basis for systematically comparing related direction-selection methods. Experiments on several PINN benchmarks further demonstrate strong empirical performance of the proposed method.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 2 minor

Summary. The paper formulates update-direction selection for multi-objective optimization and PINN training as a Chebyshev-center problem inside the dual cone of the loss gradients. It selects a normalized direction that maximizes the minimum distance to the cone facets, derives an efficient low-dimensional dual problem, proves a nonconvex convergence guarantee, and claims that scale robustness, simultaneous descent, and related properties emerge automatically from this single geometric rule rather than from explicit constraints.

Significance. If the central geometric claim holds, the work supplies a unified, interpretable basis for comparing direction-selection methods and removes the need for property-by-property engineering. The efficient dual formulation and nonconvex convergence result would be practically useful for PINN training and other multi-task settings.

major comments (3)
  1. [§3.2] §3.2, the dual-cone Chebyshev-center formulation: the manuscript must explicitly verify that the maximin-distance direction d satisfies ∇L_i · d < 0 for every individual loss gradient (simultaneous descent) and remains invariant under positive rescaling of any L_i. The skeptic note indicates this is not automatic when facets are built from un-normalized gradients; a short derivation or counter-example check is required to confirm no hidden normalization or post-processing is used.
  2. [Theorem 4.1] Theorem 4.1 (nonconvex convergence): the proof sketch relies on the selected direction being a strict descent direction for the vector of losses. If the geometric property in §3.2 does not guarantee simultaneous descent for all terms, the convergence argument does not go through; the theorem statement and its hypotheses must be tightened to match the actual properties delivered by the Chebyshev center.
  3. [§5] §5, experimental section: the reported benchmark gains are presented without ablation on the dual-cone construction versus explicit scale-normalization baselines. If the claimed automatic recovery of scale robustness is the key novelty, the tables should isolate whether removing any implicit normalization changes the performance gap.
minor comments (2)
  1. Notation for the dual cone and its facets is introduced without a small diagram; a one-sentence geometric illustration would help readers unfamiliar with cone duality.
  2. The abstract states 'yields a convergence guarantee in the nonconvex setting' but does not specify the precise assumptions (e.g., Lipschitz constants, bounded gradients). This should be stated explicitly in the abstract or introduction.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive comments, which help clarify the geometric properties and strengthen the empirical support. We address each major comment below and will incorporate revisions to make the claims fully explicit.

read point-by-point responses
  1. Referee: [§3.2] §3.2, the dual-cone Chebyshev-center formulation: the manuscript must explicitly verify that the maximin-distance direction d satisfies ∇L_i · d < 0 for every individual loss gradient (simultaneous descent) and remains invariant under positive rescaling of any L_i. The skeptic note indicates this is not automatic when facets are built from un-normalized gradients; a short derivation or counter-example check is required to confirm no hidden normalization or post-processing is used.

    Authors: We agree that an explicit verification strengthens the presentation. By construction, the Chebyshev center lies in the strict interior of the dual cone, which is the set of directions d satisfying ∇L_i · d < 0 for all i (simultaneous descent). For scale invariance: positive rescaling of any gradient leaves the dual cone unchanged because the bounding hyperplanes are defined by the rays of the gradients; the maximin-distance point (normalized) is therefore identical. We will insert a short derivation and a one-line counter-example check (scaling a single gradient) in §3.2 to confirm no post-processing is used. revision: yes

  2. Referee: [Theorem 4.1] Theorem 4.1 (nonconvex convergence): the proof sketch relies on the selected direction being a strict descent direction for the vector of losses. If the geometric property in §3.2 does not guarantee simultaneous descent for all terms, the convergence argument does not go through; the theorem statement and its hypotheses must be tightened to match the actual properties delivered by the Chebyshev center.

    Authors: With the explicit verification added to §3.2, the direction is guaranteed to be a strict descent direction for every loss term. Consequently the hypotheses of Theorem 4.1 remain valid as stated. We will augment the proof sketch with a direct reference to the new derivation in §3.2 so that the logical chain is transparent. revision: partial

  3. Referee: [§5] §5, experimental section: the reported benchmark gains are presented without ablation on the dual-cone construction versus explicit scale-normalization baselines. If the claimed automatic recovery of scale robustness is the key novelty, the tables should isolate whether removing any implicit normalization changes the performance gap.

    Authors: We concur that an ablation isolating the geometric construction from explicit normalization is valuable. We will add a new table (or supplementary table) that compares (i) the proposed method, (ii) the same method with forced gradient normalization, and (iii) representative baselines that enforce scale robustness by hand. This will quantify whether the performance gap persists when normalization is removed, directly supporting the claim of automatic recovery. revision: yes

Circularity Check

0 steps flagged

No circularity: formulation derives properties from independent geometric criterion

full rationale

The paper defines the update direction via a new Chebyshev-center optimization in the dual cone that maximizes min-distance to facets, then states that scale robustness, simultaneous descent, and convergence follow directly from this single rule without separate enforcement. No self-citations appear as load-bearing premises, no parameters are fitted to data and relabeled as predictions, and no prior result by the same authors is invoked to force uniqueness or smuggle an ansatz. The derivation chain is therefore self-contained: the geometric program is stated, its dual is derived, and the claimed properties are asserted to be logical consequences rather than inputs renamed as outputs.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on standard convex-geometry notions (dual cone, Chebyshev center) applied to the vector of loss gradients; no new entities are postulated and no free parameters are introduced in the abstract.

axioms (1)
  • domain assumption The set of loss gradients defines a cone whose dual admits a well-defined Chebyshev center that corresponds to a useful update direction.
    Invoked when the paper states that the formulation selects a normalized direction maximizing minimum distance to cone facets.

pith-pipeline@v0.9.0 · 5531 in / 1332 out tokens · 46504 ms · 2026-05-12T02:20:01.149252+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

33 extracted references · 33 canonical work pages · 1 internal anchor

  1. [1]

    Maziar Raissi, Paris Perdikaris, and George E Karniadakis. Physics-informed neural networks: A deep learning framework for solving forward and inverse problems involving nonlinear partial differential equations.Journal of Computational physics, 378:686–707, 2019

  2. [2]

    Physics-informed machine learning.Nature Reviews Physics, 3(6):422–440, 2021

    George Em Karniadakis, Ioannis G Kevrekidis, Lu Lu, Paris Perdikaris, Sifan Wang, and Liu Yang. Physics-informed machine learning.Nature Reviews Physics, 3(6):422–440, 2021

  3. [3]

    Characterizing possible failure modes in physics-informed neural networks.Advances in neural information processing systems, 34:26548–26560, 2021

    Aditi Krishnapriyan, Amir Gholami, Shandian Zhe, Robert Kirby, and Michael W Mahoney. Characterizing possible failure modes in physics-informed neural networks.Advances in neural information processing systems, 34:26548–26560, 2021

  4. [4]

    Scientific machine learning through physics–informed neural networks: Where we are and what’s next.Journal of Scientific Computing, 92(3):88, 2022

    Salvatore Cuomo, Vincenzo Schiano Di Cola, Fabio Giampaolo, Gianluigi Rozza, Maziar Raissi, and Francesco Piccialli. Scientific machine learning through physics–informed neural networks: Where we are and what’s next.Journal of Scientific Computing, 92(3):88, 2022

  5. [5]

    A comprehensive study of non-adaptive and residual-based adaptive sampling for physics-informed neural networks

    Chenxi Wu, Min Zhu, Qinyang Tan, Yadhu Kartha, and Lu Lu. A comprehensive study of non-adaptive and residual-based adaptive sampling for physics-informed neural networks. Computer Methods in Applied Mechanics and Engineering, 403:115671, 2023

  6. [6]

    On the eigenvector bias of fourier feature networks: From regression to solving multi-scale pdes with physics-informed neural networks

    Sifan Wang, Hanwen Wang, and Paris Perdikaris. On the eigenvector bias of fourier feature networks: From regression to solving multi-scale pdes with physics-informed neural networks. Computer Methods in Applied Mechanics and Engineering, 384:113938, 2021

  7. [7]

    Understanding and mitigating gradient flow pathologies in physics-informed neural networks.SIAM Journal on Scientific Computing, 43 (5):A3055–A3081, 2021

    Sifan Wang, Yujun Teng, and Paris Perdikaris. Understanding and mitigating gradient flow pathologies in physics-informed neural networks.SIAM Journal on Scientific Computing, 43 (5):A3055–A3081, 2021

  8. [8]

    Efficient training of physics-informed neural networks via importance sampling.Computer-Aided Civil and Infrastructure Engineering, 36(8):962–977, 2021

    Mohammad Amin Nabian, Rini Jasmine Gladstone, and Hadi Meidani. Efficient training of physics-informed neural networks via importance sampling.Computer-Aided Civil and Infrastructure Engineering, 36(8):962–977, 2021. 12

  9. [9]

    Das-pinns: A deep adaptive sampling method for solving high-dimensional partial differential equations.Journal of Computational Physics, 476:111868, 2023

    Kejun Tang, Xiaoliang Wan, and Chao Yang. Das-pinns: A deep adaptive sampling method for solving high-dimensional partial differential equations.Journal of Computational Physics, 476:111868, 2023

  10. [10]

    Failure-informed adaptive sampling for pinns.SIAM Journal on Scientific Computing, 45(4):A1971–A1994, 2023

    Zhiwei Gao, Liang Yan, and Tao Zhou. Failure-informed adaptive sampling for pinns.SIAM Journal on Scientific Computing, 45(4):A1971–A1994, 2023

  11. [11]

    Ameya D Jagtap and George Em Karniadakis. Extended physics-informed neural networks (xpinns): A generalized space-time domain decomposition based deep learning framework for nonlinear partial differential equations.Communications in Computational Physics, 28(5), 2020

  12. [12]

    Separable physics-informed neural networks.Advances in Neural Information Processing Systems, 36:23761–23788, 2023

    Junwoo Cho, Seungtae Nam, Hyunmo Yang, Seok-Bae Yun, Youngjoon Hong, and Eunbyung Park. Separable physics-informed neural networks.Advances in Neural Information Processing Systems, 36:23761–23788, 2023

  13. [13]

    Aditya Prakash

    Zhiyuan Zhao, Xueying Ding, and B. Aditya Prakash. PINNsformer: A transformer-based framework for physics-informed neural networks. InThe Twelfth International Conference on Learning Representations, 2024. URLhttps://openreview.net/forum?id=DO2WFXU1Be

  14. [14]

    When and why pinns fail to train: A neural tangent kernel perspective.Journal of Computational Physics, 449:110768, 2022

    Sifan Wang, Xinling Yu, and Paris Perdikaris. When and why pinns fail to train: A neural tangent kernel perspective.Journal of Computational Physics, 449:110768, 2022

  15. [15]

    Multi-objective loss balancing for physics-informed deep learning.Computer Methods in Applied Mechanics and Engineering, 439:117914, 2025

    Rafael Bischof and Michael A Kraus. Multi-objective loss balancing for physics-informed deep learning.Computer Methods in Applied Mechanics and Engineering, 439:117914, 2025

  16. [16]

    Dual cone gradient descent for training physics-informed neural networks.Advances in Neural Information Processing Systems, 37:98563–98595, 2024

    Youngsik Hwang and Dong-Young Lim. Dual cone gradient descent for training physics-informed neural networks.Advances in Neural Information Processing Systems, 37:98563–98595, 2024

  17. [17]

    ConFIG: Towards conflict-free training of physics informed neural networks

    Qiang Liu, Mengyu Chu, and Nils Thuerey. ConFIG: Towards conflict-free training of physics informed neural networks. InThe Thirteenth International Conference on Learning Represen- tations, 2025. URLhttps://openreview.net/forum?id=APojAzJQiq

  18. [18]

    Harmonized cone for feasible and non-conflict directions in training physics-informed neural networks

    Dohyun Bu, Yujung Byun, and Jong-Seok Lee. Harmonized cone for feasible and non-conflict directions in training physics-informed neural networks. InThe Fourteenth International Conference on Learning Representations, 2026. URLhttps://openreview.net/forum?id= PRYl1mO1go

  19. [19]

    Towards impartial multi-task learning

    Liyang Liu, Yi Li, Zhanghui Kuang, Jing-Hao Xue, Yimin Chen, Wenming Yang, Qingmin Liao, and Wayne Zhang. Towards impartial multi-task learning. InInternational conference on learning representations, 2021

  20. [20]

    Multi-task learning as multi-objective optimization.Advances in neural information processing systems, 31, 2018

    Ozan Sener and Vladlen Koltun. Multi-task learning as multi-objective optimization.Advances in neural information processing systems, 31, 2018

  21. [21]

    Gradient- adaptive policy optimization: Towards multi-objective alignment of large language models

    Chengao Li, Hanyu Zhang, Yunkun Xu, Hongyan Xue, Xiang Ao, and Qing He. Gradient- adaptive policy optimization: Towards multi-objective alignment of large language models. InProceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 11214–11232, 2025. 13

  22. [22]

    Steepest descent methods for multicriteria optimization

    Jörg Fliege and Benar Fux Svaiter. Steepest descent methods for multicriteria optimization. Mathematical methods of operations research, 51(3):479–494, 2000

  23. [23]

    Gradient surgery for multi-task learning.Advances in neural information processing systems, 33:5824–5836, 2020

    Tianhe Yu, Saurabh Kumar, Abhishek Gupta, Sergey Levine, Karol Hausman, and Chelsea Finn. Gradient surgery for multi-task learning.Advances in neural information processing systems, 33:5824–5836, 2020

  24. [24]

    Conflict-averse gradient descent for multi-task learning.Advances in neural information processing systems, 34:18878–18890, 2021

    Bo Liu, Xingchao Liu, Xiaojie Jin, Peter Stone, and Qiang Liu. Conflict-averse gradient descent for multi-task learning.Advances in neural information processing systems, 34:18878–18890, 2021

  25. [25]

    Gradient-adaptive pareto optimization for constrained reinforcement learning

    Zixian Zhou, Mengda Huang, Feiyang Pan, Jia He, Xiang Ao, Dandan Tu, and Qing He. Gradient-adaptive pareto optimization for constrained reinforcement learning. InProceedings of the AAAI Conference on Artificial Intelligence, volume 37, pages 11443–11451, 2023

  26. [26]

    Pmgda: A preference-based multiple gradient descent algorithm.IEEE Transactions on Emerging Topics in Computational Intelligence, 2025

    Xiaoyuan Zhang, Xi Lin, and Qingfu Zhang. Pmgda: A preference-based multiple gradient descent algorithm.IEEE Transactions on Emerging Topics in Computational Intelligence, 2025

  27. [27]

    Reward-free alignment for conflicting objectives.arXiv preprint arXiv:2602.02495, 2026

    Peter L Chen, Xiaopeng Li, Xi Chen, and Tianyi Lin. Reward-free alignment for conflicting objectives.arXiv preprint arXiv:2602.02495, 2026

  28. [28]

    Multiadam: Parameter-wise scale-invariant optimizer for multiscale training of physics-informed neural networks

    Jiachen Yao, Chang Su, Zhongkai Hao, Songming Liu, Hang Su, and Jun Zhu. Multiadam: Parameter-wise scale-invariant optimizer for multiscale training of physics-informed neural networks. InInternational conference on machine learning, pages 39702–39721. PMLR, 2023

  29. [29]

    En- hancing stability of physics-informed neural network training through saddle-point reformula- tion

    Dmitry Bylinkin, Mikhail Aleksandrov, Savelii Chezhegov, and Aleksandr Beznosikov. En- hancing stability of physics-informed neural network training through saddle-point reformula- tion. InThe Fourteenth International Conference on Learning Representations, 2026. URL https://openreview.net/forum?id=EQNp3sFrY3

  30. [30]

    Cambridge university press, 2004

    Stephen Boyd and Lieven Vandenberghe.Convex optimization. Cambridge university press, 2004

  31. [31]

    Adam: A Method for Stochastic Optimization

    Diederik P Kingma and Jimmy Ba. Adam: A method for stochastic optimization.arXiv preprint arXiv:1412.6980, 2014

  32. [32]

    Pinnacle: A comprehensive benchmark of physics-informed neural networks for solving pdes.Advances in Neural Information Processing Systems, 37: 76721–76774, 2024

    Zhongkai Hao, Jiachen Yao, Chang Su, Hang Su, Ziao Wang, Fanzhi Lu, Zeyu Xia, Yichi Zhang, Songming Liu, Lu Lu, et al. Pinnacle: A comprehensive benchmark of physics-informed neural networks for solving pdes.Advances in Neural Information Processing Systems, 37: 76721–76774, 2024

  33. [33]

    Understanding the difficulty of training deep feedforward neuralnetworks

    Xavier Glorot and Yoshua Bengio. Understanding the difficulty of training deep feedforward neuralnetworks. InProceedings of the thirteenth international conference on artificial intelligence and statistics, pages 249–256. JMLR Workshop and Conference Proceedings, 2010. 14 A Additional detail and proofs for section 3 A.1 Proof of proposition 3.2 Recall the...