arxiv: 2604.11422 · v1 · submitted 2026-04-13 · 💻 cs.LG · cs.AI

Recognition: unknown

Emulating Non-Differentiable Metrics via Knowledge-Guided Learning: Introducing the Minkowski Image Loss

Filippo Quarenghi, Ryan Cotsakis, Tom Beucler

Authors on Pith no claims yet

Pith reviewed 2026-05-10 16:08 UTC · model grok-4.3

classification 💻 cs.LG cs.AI

keywords Minkowski functionalsdifferentiable surrogatesLipschitz constraintsprecipitation fieldsEarth system modelingneural emulationtopological metrics

0 comments

The pith

Constrained neural networks emulate non-differentiable Minkowski measures for precipitation fields without geometric errors.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Non-differentiable metrics such as Minkowski functionals for precipitation area, perimeter, and connectivity block direct gradient-based training in Earth system deep learning, leading to blurry model outputs when only smooth proxies like MSE are available. The paper develops two solutions: analytical relaxation of discrete operations via temperature-controlled sigmoids, and neural surrogates using Lipschitz-constrained convolutional networks that apply spectral normalization plus hard architectural constraints to respect geometric principles. These produce the Minkowski image loss, a differentiable functional validated on the EUMETNET OPERA precipitation dataset. The constrained emulator attains high accuracy and eradicates the geometric violations that unconstrained networks produce. Application to deterministic super-resolution, however, uncovers a trade-off in which stability from the constraints comes at the cost of over-smoothed gradients that fail to capture localized convective textures.

Core claim

We formulate the Minkowski image loss as a differentiable equivalent to the integral-geometric measures of surface precipitation fields by training Lipschitz-convolutional neural networks stabilized through spectral normalization and hard geometric constraints, demonstrating high emulation accuracy and complete elimination of geometric violations on the EUMETNET OPERA dataset.

What carries the argument

Lipschitz-constrained convolutional neural networks that enforce geometric principles via spectral normalization and architectural constraints to emulate non-differentiable Minkowski functionals.

Load-bearing premise

That the stability-smoothness trade-off observed in deterministic super-resolution can be resolved by coupling the Lipschitz constraints with stochastic generative architectures to recover localized convective textures.

What would settle it

Running a precipitation super-resolution experiment with the Minkowski image loss inside a stochastic generative model and verifying whether the generated fields recover observed high-frequency convective textures while maintaining geometric consistency.

Figures

Figures reproduced from arXiv: 2604.11422 by Filippo Quarenghi, Ryan Cotsakis, Tom Beucler.

**Figure 1.** Figure 1: Conceptual overview of the differentiable surrogate training framework. The lack of gradient flow (red [PITH_FULL_IMAGE:figures/full_fig_p002_1.png] view at source ↗

**Figure 2.** Figure 2: (Left) Lipschitz-Bound Feature Extractor: The backbone acts as a hierarchical encoder. Stability is enforced via spectral normalization and residual connections. We use global sum pooling to preserve the extensivity of geometric features. (Right) Geometric Constraint Heads: The Area 𝐴ˆ(𝑢) is constructed via integration of a predicted probability density to enforce monotonicity. The perimeter 𝑃ˆ(𝑢) is condi… view at source ↗

**Figure 3.** Figure 3: Feature Inversion Results. Qualitative comparison of precipitation fields reconstructed by inverting the target Minkowski vector 𝜸gt for a test sample (left). All architectures recover the storm’s magnitude, while only constrained models generates coherent, smooth intensity gradients characteristic of convective cells (bottom left) DEM Original (low res) Bicubic interp. UNet baseline UNet Mink. analytical … view at source ↗

**Figure 4.** Figure 4: Structural fidelity in precipitation downscaling. The unconstrained UNet produces an amorphous shield, missing localized convective peaks. Integral-geometric constraints (analytical and Lip-CNN) yield only marginal improvement. The stochastic DDIM baseline recovers textural realism and multi-scale variance, confirming the limitations of deterministic optimization 4.3 Deterministic super-resolution [PITH_F… view at source ↗

**Figure 5.** Figure 5: Qualitative prediction analysis. Comparison of ground truth (solid) and predicted (dashed) Minkowski functionals. (Top) A well-behaved storm where all models converge. (Bottom) A challenging, fragmented case. Unconstrained models (blue/orange) predict geometrically impossible perimeters (violating monotonicity), whereas the Constrained model (magenta) maintains consistency by design 200 100 50 20 10 4 Wave… view at source ↗

**Figure 6.** Figure 6: Spectral Validation. (Left) Radially Averaged Power Spectral Density comparing the energy cascade of the ground truth (black) against the reconstructions. (Right) The spectral ratio 𝑆model(𝑘)/𝑆ref(𝑘). Ideal performance corresponds to 𝑦 = 1 first apply a 2D Hanning window 𝑤 to mitigate spectral leakage at the boundaries. We then compute the 2D discrete Fourier transform (DFT), denoted as xˆ(𝑘 𝑥, 𝑘 𝑦). The 2… view at source ↗

**Figure 7.** Figure 7: Mechanistic evaluation of the surrogate’s sensitivity to localized physical perturbations. [PITH_FULL_IMAGE:figures/full_fig_p015_7.png] view at source ↗

read the original abstract

The ``differentiability gap'' presents a primary bottleneck in Earth system deep learning: since models cannot be trained directly on non-differentiable scientific metrics and must rely on smooth proxies (e.g., MSE), they often fail to capture high-frequency details, yielding ``blurry'' outputs. We develop a framework that bridges this gap using two different methods to deal with non-differentiable functions: the first is to analytically approximate the original non-differentiable function into a differentiable equivalent one; the second is to learn differentiable surrogates for scientific functionals. We formulate the analytical approximation by relaxing discrete topological operations using temperature-controlled sigmoids and continuous logical operators. Conversely, our neural emulator uses Lipschitz-convolutional neural networks to stabilize gradient learning via: (1) spectral normalization to bound the Lipschitz constant; and (2) hard architectural constraints enforcing geometric principles. We demonstrate this framework's utility by developing the Minkowski image loss, a differentiable equivalent for the integral-geometric measures of surface precipitation fields (area, perimeter, connected components). Validated on the EUMETNET OPERA dataset, our constrained neural surrogate achieves high emulation accuracy, completely eliminating the geometric violations observed in unconstrained baselines. However, applying these differentiable surrogates to a deterministic super-resolution task reveals a fundamental trade-off: while strict Lipschitz regularization ensures optimization stability, it inherently over-smooths gradient signals, restricting the recovery of highly localized convective textures. This work highlights the necessity of coupling such topological constraints with stochastic generative architectures to achieve full morphological realism.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The Minkowski loss gives a workable emulator for geometric precipitation metrics but the Lipschitz constraints blunt gradients enough that it may not yet close the differentiability gap in optimization.

read the letter

The main takeaway is that this paper shows how to build a differentiable surrogate for Minkowski functionals on precipitation fields using both analytical sigmoid relaxations and a CNN with spectral normalization plus hard geometric constraints. The constrained version matches the target measures on EUMETNET OPERA data and removes the geometric violations that appear in unconstrained baselines. That dual analytical-plus-neural setup and the explicit enforcement of integral-geometric principles are the concrete new pieces here. The authors also correctly flag the stability-smoothness trade-off when the surrogate is used as a loss in deterministic super-resolution, where the regularization prevents recovery of sharp convective textures. That honesty about the limitation is useful. The soft spot is that pointwise emulation accuracy does not automatically deliver usable gradients. The paper does not report direct comparisons of the surrogate gradients against the true non-differentiable functionals, nor does it include an end-to-end optimization test that measures actual morphological improvement. Without those checks, the claim that violations are eliminated in a way that helps training remains partly open, and the proposed fix of coupling to stochastic generators stays untested. This is aimed at climate and remote-sensing researchers who want to move beyond MSE proxies for shape-sensitive tasks. Readers working on differentiable topology or geometric losses will pick up practical constraint ideas and a clear statement of the remaining engineering problems. It has enough technical grounding and honest engagement with the literature to deserve a serious referee, though reviewers will likely press for gradient fidelity numbers and at least one optimization experiment.

Referee Report

3 major / 2 minor

Summary. The paper proposes a framework to bridge the differentiability gap in Earth system deep learning by emulating non-differentiable Minkowski functionals (area, perimeter, connected components) for precipitation fields. It develops both an analytical approximation via temperature-controlled sigmoids and continuous logical operators, and a neural surrogate using Lipschitz-constrained CNNs with spectral normalization and hard geometric architectural constraints. The resulting Minkowski Image Loss is validated on the EUMETNET OPERA dataset, where the constrained surrogate reports high emulation accuracy and eliminates geometric violations observed in unconstrained baselines. When inserted as a loss in a deterministic super-resolution task, a stability-smoothness trade-off is identified, with the suggestion that stochastic generative architectures are needed to recover localized convective textures.

Significance. If the surrogate gradients prove faithful to the underlying non-differentiable functionals, the work could enable direct optimization on physically interpretable geometric metrics rather than MSE proxies, improving morphological realism in precipitation modeling. Strengths include the explicit dual-method approach (analytical and learned), the use of Lipschitz constraints for stability, and the clear identification of the stability-smoothness trade-off as a direction for future research. The validation on real OPERA data and the focus on integral-geometric measures provide a concrete testbed for knowledge-guided surrogates in scientific ML.

major comments (3)

[Abstract and §5] Abstract and §5 (super-resolution experiments): The central claim that the Lipschitz-constrained surrogate 'completely eliminating the geometric violations' rests on pointwise emulation accuracy, yet the manuscript does not quantify gradient fidelity (e.g., cosine similarity or L2 distance between surrogate gradients and finite-difference approximations of the true Minkowski functionals). This is load-bearing because the differentiability gap is bridged only if the surrogate supplies usable gradients that recover high-frequency structure; value accuracy alone does not establish this, as noted by the observed over-smoothing.
[§4.2] §4.2 (validation on OPERA data): The reported 'high emulation accuracy' lacks accompanying quantitative details such as per-functional MAE or RMSE with error bars, ablation results isolating spectral normalization versus hard geometric constraints, and direct comparison of geometric violation rates (e.g., number of disconnected components or perimeter errors) against the true non-differentiable measures. Without these, the strength of the 'complete elimination' claim relative to unconstrained baselines cannot be fully assessed.
[§3.1] §3.1 (analytical approximation): The temperature parameter controlling the sigmoid relaxations is treated as a tunable hyperparameter; this introduces an additional degree of freedom whose effect on gradient quality and approximation error is not systematically characterized, which may limit the parameter-free character of the analytical path and interact with the Lipschitz constraints in the learned path.

minor comments (2)

[§2] Notation for the Minkowski functionals (area, perimeter, connected components) should be consistently defined with symbols and referenced to the integral-geometry literature in the methods section.
[Figure 4] Figure captions for the super-resolution results should explicitly state the quantitative metrics used to illustrate the stability-smoothness trade-off (e.g., which gradient norm or texture measure is plotted).

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for their constructive and detailed comments, which identify key areas for strengthening the validation of our Minkowski Image Loss framework. We address each major comment point by point below, providing clarifications and committing to specific revisions that will enhance the manuscript's rigor without altering its core contributions.

read point-by-point responses

Referee: [Abstract and §5] Abstract and §5 (super-resolution experiments): The central claim that the Lipschitz-constrained surrogate 'completely eliminating the geometric violations' rests on pointwise emulation accuracy, yet the manuscript does not quantify gradient fidelity (e.g., cosine similarity or L2 distance between surrogate gradients and finite-difference approximations of the true Minkowski functionals). This is load-bearing because the differentiability gap is bridged only if the surrogate supplies usable gradients that recover high-frequency structure; value accuracy alone does not establish this, as noted by the observed over-smoothing.

Authors: We agree that gradient fidelity is a critical aspect for validating the surrogate's ability to bridge the differentiability gap, particularly given the stability-smoothness trade-off observed in the super-resolution experiments. Our current claims of 'completely eliminating the geometric violations' are grounded in the surrogate outputs satisfying the integral-geometric properties (e.g., no spurious disconnected components or perimeter inconsistencies) on the OPERA validation set, which we view as indirect evidence of faithful emulation. However, we acknowledge that explicit quantification of gradient alignment would provide stronger support for the usability of these gradients in optimization. In the revised manuscript, we will add direct comparisons using cosine similarity and L2 distance between the surrogate gradients and finite-difference approximations of the true Minkowski functionals, computed on held-out precipitation fields. This analysis will be presented alongside the existing pointwise accuracy results to better contextualize the over-smoothing behavior. revision: yes
Referee: [§4.2] §4.2 (validation on OPERA data): The reported 'high emulation accuracy' lacks accompanying quantitative details such as per-functional MAE or RMSE with error bars, ablation results isolating spectral normalization versus hard geometric constraints, and direct comparison of geometric violation rates (e.g., number of disconnected components or perimeter errors) against the true non-differentiable measures. Without these, the strength of the 'complete elimination' claim relative to unconstrained baselines cannot be fully assessed.

Authors: We appreciate this observation, as the current manuscript relies on qualitative descriptions and the absence of violations in the constrained model to support the 'high emulation accuracy' statement. To enable a more rigorous assessment, the revised version will include detailed quantitative metrics: per-functional MAE and RMSE (for area, perimeter, and connected components) with error bars derived from multiple random seeds or cross-validation folds on the EUMETNET OPERA dataset. We will also add ablation experiments that isolate the effects of spectral normalization from the hard geometric architectural constraints. Finally, we will report explicit geometric violation rates—such as the mean number of disconnected components and perimeter deviation errors—for the constrained surrogate, unconstrained baselines, and the ground-truth non-differentiable computations, allowing direct comparison of the 'complete elimination' claim. revision: yes
Referee: [§3.1] §3.1 (analytical approximation): The temperature parameter controlling the sigmoid relaxations is treated as a tunable hyperparameter; this introduces an additional degree of freedom whose effect on gradient quality and approximation error is not systematically characterized, which may limit the parameter-free character of the analytical path and interact with the Lipschitz constraints in the learned path.

Authors: We concur that treating the temperature as a tunable hyperparameter introduces an extra degree of freedom that warrants further examination, especially for assessing its influence on gradient quality and any potential interactions with the learned Lipschitz-constrained path. The analytical approximation is intended as a complementary, interpretable alternative rather than a strictly parameter-free method, but we agree that its sensitivity has not been fully documented. In the revision, we will incorporate a systematic sensitivity study, including tables or figures that vary the temperature across a range of values and report the resulting changes in approximation error (MAE to true functionals) and gradient characteristics (e.g., norm and alignment with finite differences). This will clarify the trade-offs and strengthen the presentation of both the analytical and learned approaches. revision: yes

Circularity Check

0 steps flagged

No significant circularity; empirical surrogate training on external data

full rationale

The paper's core contribution is an empirical neural surrogate trained directly against the target non-differentiable Minkowski functionals (area, perimeter, connected components) on the external EUMETNET OPERA dataset, using spectral normalization and hard geometric constraints. No derivation step reduces a claimed prediction or uniqueness result to a fitted parameter, self-citation chain, or input by construction. The stability-smoothness trade-off is explicitly flagged as an open limitation rather than resolved tautologically. All load-bearing claims rest on held-out validation accuracy and baseline comparisons, which are falsifiable outside the model's own outputs.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

The framework rests on the assumption that temperature-controlled sigmoids can faithfully relax discrete topological operations and that hard Lipschitz constraints plus spectral normalization will produce stable yet accurate surrogates; no free parameters are explicitly named in the abstract.

free parameters (1)

temperature parameter for sigmoid relaxations
Controls the sharpness of the continuous approximation to discrete topological operations such as counting connected components.

axioms (1)

domain assumption Lipschitz continuity of the emulator guarantees gradient stability during back-propagation
Invoked to justify spectral normalization and architectural constraints.

pith-pipeline@v0.9.0 · 5583 in / 1257 out tokens · 60030 ms · 2026-05-10T16:08:31.329101+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

6 extracted references · 2 canonical work pages · 1 internal anchor

[1]

AghaKouchak, A., Behrangi, A., Sorooshian, S., Hsu, K., and Amitai, E. (2011). Evaluation of satellite-retrieved extreme precipitation rates across the central united states.Journal of Geophysical Research: Atmospheres, 116(D2). Akiba, T., Sano, S., Yanase, T., Ohta, T., and Koyama, M. (2019). Optuna: A next-generation hyperparameter optimization framewor...

work page arXiv 2011
[2]

Gaussian Error Linear Units (GELUs)

Hadwiger, H. (1957).Vorlesungen über Inhalt, Oberfläche und Isoperimetrie. Springer. Hendrycks, D. and Gimpel, K. (2016). Gaussian error linear units (gelus).arXiv preprint arXiv:1606.08415. Hu, X., Li, F., Samaras, D., and Chen, C. (2019). Topology-preserving deep image segmentation. InAdvances in Neural Information Processing Systems, volume

work page internal anchor Pith review Pith/arXiv arXiv 1957
[3]

Jang, E., Gu, S., and Poole, B. (2017). Categorical reparameterization with gumbel-softmax. Karpatne, A., Atluri, G., Faghmous, J. H., Kumar, V ., et al. (2017). Theory-guided data science: A new paradigm for scientific discovery from data.IEEE Transactions on Knowledge and Data Engineering, 29(10):2318–2331. Maria, C., Boissonnat, J.-D., Glisse, M., and ...

2017
[4]

signal" (stable storm cells) from

states that the area of the set of points within a distance𝑟from𝐾(for sufficiently small𝑟≥0) is given by: 𝐴𝐾⊕𝑟 =𝐴 𝐾 +𝑃 𝐾 𝑟+𝜒 𝐾 𝜋𝑟 2 (3) where 𝐴𝐾 is the area, 𝑃𝐾 is the perimeter, and 𝜒𝐾 is the Euler characteristic of 𝐾. This formula holds not only for convex sets, but for any set with positive reach (Thäle, 2008), a condition satisfied by sufficiently reg...

2008
[5]

The claim follows almost immediately from the fundamental lemma of persistent homology (Edelsbrunner and Harer, 2010, p. 181). The rank of the 0th persistent homology group of the excursion setH 0(𝐸 𝑢) is given by the Betti number 𝛽𝑢,𝑢 0 =rank H0 (𝐸 𝑢)={# connected components of𝐸 𝑢}. Now, as a consequence of the fundamental lemma, 𝛽𝑢,𝑢 0 , which is the nu...

2010
[6]

checkerboard

Additionally, we explicitly handle the infinite-persistence feature (representing the global background component); this feature is included in the count only for low-intensity thresholds (𝑢≤0.01 mm h −1) to ensure consistent topology at the field boundaries. To generate the complete 𝜸-matrix for the entire dataset, we execute a rigorous offline target ge...

2000