pith. machine review for the scientific record. sign in

arxiv: 2604.14451 · v1 · submitted 2026-04-15 · 🌌 astro-ph.CO · cs.AI· cs.CV· physics.data-an

Recognition: unknown

FAIR Universe Weak Lensing ML Uncertainty Challenge: Handling Uncertainties and Distribution Shifts for Precision Cosmology

Authors on Pith no claims yet

Pith reviewed 2026-05-10 11:38 UTC · model grok-4.3

classification 🌌 astro-ph.CO cs.AIcs.CVphysics.data-an
keywords weak lensingmachine learningcosmological parametersdistribution shiftssystematic uncertaintiesbenchmark datasetchallenge
0
0 comments X

The pith

A benchmark dataset and challenge standardize machine learning tests for extracting cosmological parameters from weak lensing data under limited training and realistic systematics.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces the first weak lensing benchmark dataset that incorporates several realistic systematic effects such as those expected in upcoming surveys. It launches the FAIR Universe Weak Lensing Machine Learning Uncertainty Challenge, which requires participants to recover fundamental universe properties from this data despite small training sets and distribution shifts between simulations and observations. The effort supplies a common testbed so that different methods can be compared rigorously rather than across incompatible simulation setups. A reader would care because weak lensing is a key probe of matter distribution and dark energy, yet current machine-learning approaches are hampered by exactly the computational cost, modeling inaccuracy, and lack of standardization addressed here.

Core claim

We present the first weak lensing benchmark dataset with several realistic systematics and launch the FAIR Universe Weak Lensing Machine Learning Uncertainty Challenge. The challenge focuses on measuring the fundamental properties of the universe from weak lensing data with limited training set and potential distribution shifts, while providing a standardized benchmark for rigorous comparison across methods. Organized in two phases, the challenge will bring together the physics and ML communities to advance the methodologies for handling systematic uncertainties, data efficiency, and distribution shifts in weak lensing analysis with ML, ultimately facilitating the deployment of ML approaches

What carries the argument

The FAIR Universe Weak Lensing benchmark dataset that embeds realistic systematics and distribution shifts between simulations and observations.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Methods that succeed on the challenge may transfer more reliably to real survey pipelines where simulation fidelity is imperfect.
  • The two-phase structure could expose which uncertainty-handling techniques scale best when training data remain scarce.
  • Standardization of this sort may reduce the current practice of each group publishing results on its own private simulation suite.

Load-bearing premise

The simulations used to build the benchmark dataset accurately reproduce the distribution shifts and systematic effects that will appear in real observational data from upcoming surveys.

What would settle it

A direct comparison in which the benchmark dataset's simulated galaxy shape distributions or power spectra deviate measurably from those measured in actual early data from surveys such as LSST or Euclid in ways that change recovered cosmological parameters.

Figures

Figures reproduced from arXiv: 2604.14451 by Benjamin Nachman, Biwei Dai, Chris Harris, David Rousseau, Elham E Khoda, Ibrahim Elsharkawy, Ihsan Ullah, Isabelle Guyon, Jordan Dudley, Paolo Calafiura, Po-Wen Chang, Ragansu Chakkappai, Sascha Diefenbacher, Steven Farrell, Uro\v{s} Seljak, Wahid Bhimji, Yuan-Tang Chou, Yulei Zhang.

Figure 1
Figure 1. Figure 1: Example noiseless and noisy weak lensing convergence maps. The top panel and bottom [PITH_FULL_IMAGE:figures/full_fig_p005_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: The 101 cosmological parameters (Ωm, S8) in our suite of weak lensing simulations. cosmological surveys like Rubin Observatory LSST and Euclid. The shape noise can be added as a post-processing step during the training. In total, we generate 256 noiseless weak lensing maps with different realizations and different systematic parameters for each cosmological model. Each map has dimension 1424 × 176, so the … view at source ↗
Figure 3
Figure 3. Figure 3: OoD detection with an OoD score t(x) given test samples x. OoD instances are detected if t(x) is greater than a predefined threshold (a tunable parameter). The figure is adapted from Ref. [43]. 3σ). Focusing on this range therefore rewards models with high detection power under practically meaningful false-positive constraints, while logarithmic spacing emphasizes performance at the smallest FPRs. 5 Baseli… view at source ↗
Figure 4
Figure 4. Figure 4: A comparison between the posterior distributions inferred by our baseline methods. [PITH_FULL_IMAGE:figures/full_fig_p010_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: A comparison between all baseline methods for the Phase-2 task. [PITH_FULL_IMAGE:figures/full_fig_p012_5.png] view at source ↗
read the original abstract

Weak gravitational lensing, the correlated distortion of background galaxy shapes by foreground structures, is a powerful probe of the matter distribution in our universe and allows accurate constraints on the cosmological model. In recent years, high-order statistics and machine learning (ML) techniques have been applied to weak lensing data to extract the nonlinear information beyond traditional two-point analysis. However, these methods typically rely on cosmological simulations, which poses several challenges: simulations are computationally expensive, limiting most realistic setups to a low training data regime; inaccurate modeling of systematics in the simulations create distribution shifts that can bias cosmological parameter constraints; and varying simulation setups across studies make method comparison difficult. To address these difficulties, we present the first weak lensing benchmark dataset with several realistic systematics and launch the FAIR Universe Weak Lensing Machine Learning Uncertainty Challenge. The challenge focuses on measuring the fundamental properties of the universe from weak lensing data with limited training set and potential distribution shifts, while providing a standardized benchmark for rigorous comparison across methods. Organized in two phases, the challenge will bring together the physics and ML communities to advance the methodologies for handling systematic uncertainties, data efficiency, and distribution shifts in weak lensing analysis with ML, ultimately facilitating the deployment of ML approaches into upcoming weak lensing survey analysis.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The manuscript announces the FAIR Universe Weak Lensing ML Uncertainty Challenge and presents a new benchmark dataset for weak lensing cosmology. The dataset incorporates multiple systematics (including photo-z errors, intrinsic alignments, and shear calibration biases) and distribution shifts between training and test sets. The challenge is structured in two phases to develop and compare ML methods that extract cosmological parameters from weak lensing data under limited training data and potential distribution shifts, with the goal of standardizing method evaluation and facilitating deployment to upcoming surveys.

Significance. If the included systematics and shifts are representative, the standardized benchmark and community challenge could enable more rigorous cross-method comparisons of ML approaches for weak lensing, addressing key barriers such as simulation cost, systematic modeling inaccuracies, and lack of common testbeds. This has the potential to accelerate development of uncertainty-aware ML techniques suitable for precision cosmology with surveys like LSST and Euclid. The explicit community organization bridging physics and ML is a constructive contribution.

major comments (2)
  1. [§3 (Benchmark Dataset Description)] §3 (Benchmark Dataset Description): The central claim that the dataset includes 'several realistic systematics' enabling tests under distribution shifts is not supported by any quantitative validation. No comparisons (e.g., power-spectrum residuals, Kolmogorov-Smirnov tests on summary statistics, or direct matches to existing survey data or higher-fidelity mocks) are provided to show that the modeled effects produce shifts representative of real observational data. This validation is load-bearing for the challenge's utility in preparing methods for deployment to upcoming surveys.
  2. [§5 (Challenge Phases and Evaluation)] §5 (Challenge Phases and Evaluation): The description of the two-phase challenge does not specify the exact metrics or protocols that will be used to score uncertainty quantification and robustness to distribution shifts. Without these details, it is unclear how the benchmark will enforce rigorous, reproducible comparisons across submitted methods.
minor comments (2)
  1. [Abstract] The abstract and introduction could more explicitly list the specific systematics included in the benchmark (e.g., the exact parameterization of intrinsic alignments or photo-z error distributions) to allow immediate assessment of scope.
  2. [Figures and Tables] Figure captions and table descriptions should include the exact simulation parameters (e.g., cosmology, galaxy number density, redshift range) used to generate the dataset for reproducibility.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive review and for recognizing the potential of the benchmark and challenge to advance ML methods for weak lensing cosmology. We address each major comment below and will revise the manuscript accordingly to improve clarity and rigor.

read point-by-point responses
  1. Referee: §3 (Benchmark Dataset Description): The central claim that the dataset includes 'several realistic systematics' enabling tests under distribution shifts is not supported by any quantitative validation. No comparisons (e.g., power-spectrum residuals, Kolmogorov-Smirnov tests on summary statistics, or direct matches to existing survey data or higher-fidelity mocks) are provided to show that the modeled effects produce shifts representative of real observational data. This validation is load-bearing for the challenge's utility in preparing methods for deployment to upcoming surveys.

    Authors: We agree that quantitative validation strengthens the claim of realism. The current manuscript describes the systematics (photo-z errors, intrinsic alignments, shear calibration) using standard models from the literature and notes the induced distribution shifts between training and test sets, but does not include direct statistical comparisons. In the revised version we will add a dedicated subsection to §3 presenting power-spectrum residuals, Kolmogorov-Smirnov tests on summary statistics, and comparisons against existing survey data and higher-fidelity mocks to demonstrate that the modeled shifts are representative. revision: yes

  2. Referee: §5 (Challenge Phases and Evaluation): The description of the two-phase challenge does not specify the exact metrics or protocols that will be used to score uncertainty quantification and robustness to distribution shifts. Without these details, it is unclear how the benchmark will enforce rigorous, reproducible comparisons across submitted methods.

    Authors: We acknowledge that the manuscript currently outlines the two phases at a high level and refers readers to the challenge website for full scoring details. To make the paper self-contained, the revised §5 will explicitly list the planned evaluation metrics (e.g., coverage probability and calibration error for uncertainty quantification; relative degradation in parameter constraints under distribution shifts for robustness) together with the submission and ranking protocols. We will note that final numerical thresholds may be refined after community feedback but that the core metrics and reproducibility requirements will be fixed in the manuscript. revision: yes

Circularity Check

0 steps flagged

No circularity: benchmark dataset release with no derivations or self-referential predictions

full rationale

The paper contains no derivation chain, first-principles results, fitted parameters, or predictions that could reduce to inputs by construction. It is a resource announcement describing a simulated weak-lensing dataset with listed systematics and the organization of a community challenge. The central claim is the existence and utility of the benchmark itself; no equations, uniqueness theorems, or self-citations are invoked to derive any quantitative result. The reader's assessment of zero circularity is therefore confirmed: the work is self-contained as a data and challenge release without any load-bearing logical loop.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

The paper does not introduce new physical models, free parameters, or axioms; it organizes existing simulation techniques into a public benchmark and challenge.

pith-pipeline@v0.9.0 · 5613 in / 1211 out tokens · 38512 ms · 2026-05-10T11:38:20.029070+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

44 extracted references · 13 canonical work pages · 1 internal anchor

  1. [1]

    Hoekstra and B

    H. Hoekstra and B. Jain,Weak Gravitational Lensing and Its Cosmological Applications, Annual Review of Nuclear and Particle Science58(Nov., 2008) 99, arXiv:0805.0139

  2. [2]

    Cosmology with cosmic shear observations: a review

    M. Kilbinger,Cosmology with cosmic shear observations: a review, Reports on Progress in Physics78(July, 2015) 086901, arXiv:1411.0115

  3. [3]

    Lanusse et al.,The dawes review 10: The impact of deep learning for the analysis of galaxy surveys, Publications of the Astronomical Society of Australia40(2023) e001

    F. Lanusse et al.,The dawes review 10: The impact of deep learning for the analysis of galaxy surveys, Publications of the Astronomical Society of Australia40(2023) e001

  4. [4]

    Cranmer, J

    K. Cranmer, J. Brehmer, and G. Louppe,The frontier of simulation-based inference, Proceedings of the National Academy of Sciences117(2020) 30055

  5. [5]

    Fluri, T

    J. Fluri, T. Kacprzak, A. Refregier, A. Amara, A. Lucchi, and T. Hofmann,Cosmological constraints from noisy convergence maps through deep learning, Physical Review D98(2018) 123518

  6. [6]

    Non-Gaussian information from weak lensing data via deep learning

    A. Gupta, J. M. Zorrilla Matilla, D. Hsu, and Z. Haiman,Non-Gaussian information from weak lensing data via deep learning, Physical Review D97(May, 2018) 103515, arXiv:1802.01212 [astro-ph.CO]

  7. [7]

    Dai and U

    B. Dai and U. Seljak,Translation and rotation equivariant normalizing flow (trenf) for optimal cosmological analysis, Monthly Notices of the Royal Astronomical Society516(2022) 2363. 13

  8. [8]

    T. Lu, Z. Haiman, and X. Li,Cosmological constraints from hsc survey first-year data using deep learning, Monthly Notices of the Royal Astronomical Society521(2023) 2050

  9. [9]

    Dai and U

    B. Dai and U. Seljak,Multiscale flow for robust and optimal cosmological analysis, Proceedings of the National Academy of Sciences121(2024) e2309624121

  10. [10]

    Sharma, B

    D. Sharma, B. Dai, and U. Seljak,A comparative study of cosmological constraints from weak lensing using convolutional neural networks, Journal of Cosmology and Astroparticle Physics 2024(2024) 010

  11. [11]

    Cheng, G

    S. Cheng, G. A. Marques, D. Grandón, L. Thiele, M. Shirasaki, B. Ménard, and J. Liu, Cosmological constraints from weak lensing scattering transform using hsc y1 data, Journal of Cosmology and Astroparticle Physics2025(2025) 006

  12. [12]

    Jeffrey, L

    N. Jeffrey, L. Whiteway, M. Gatti, J. Williamson, J. Alsing, A. Porredon, J. Prat, C. Doux, B. Jain, C. Chang, et al.,Dark energy survey year 3 results: likelihood-free, simulation-based w cdm inference with neural compression of weak-lensing map statistics, Monthly Notices of the Royal Astronomical Society536(2025) 1303

  13. [13]

    von Wietersheim-Kramsta, K

    M. von Wietersheim-Kramsta, K. Lin, N. Tessore, B. Joachimi, A. Loureiro, R. Reischke, and A. H. Wright,Kids-sbi: Simulation-based inference analysis of kids-1000 cosmic shear, Astronomy & Astrophysics694(2025) A223

  14. [14]

    Zeghal, D

    J. Zeghal, D. Lanzieri, F. Lanusse, A. Boucaud, G. Louppe, E. Aubourg, and A. E. Bayer, Simulation-based inference benchmark for weak lensing cosmology, Astronomy & Astrophysics699(2025) A327

  15. [15]

    Villaescusa-Navarro, D

    F. Villaescusa-Navarro, D. Anglés-Alcázar, S. Genel, D. N. Spergel, Y . Li, B. Wandelt, A. Nicola, L. Thiele, S. Hassan, J. M. Z. Matilla, et al.,Multifield cosmology with artificial intelligence, arXiv preprint arXiv:2109.09747 (2021)

  16. [16]

    Z. Xu, S. Escalera, A. Pavão, M. Richard, W.-W. Tu, Q. Yao, H. Zhao, and I. Guyon, Codabench: Flexible, easy-to-use, and reproducible meta-benchmark platform, Patterns3 (2022) 100543. https://www.sciencedirect.com/science/article/pii/S2666389922001465

  17. [17]

    Benato et al.,FAIR Universe HiggsML Uncertainty Dataset and Competition, arXiv:2410.02867 [hep-ph]

    L. Benato et al.,FAIR Universe HiggsML Uncertainty Dataset and Competition, arXiv:2410.02867 [hep-ph]

  18. [18]

    Amrouche, L

    S. Amrouche, L. Basara, P. Calafiura, V . Estrade, S. Farrell, D. R. Ferreira, L. Finnie, N. Finnie, C. Germain, V . V . Gligorov, T. Golling, S. Gorbunov, H. Gray, I. Guyon, M. Hushchyn, V . Innocente, M. Kiehn, E. Moyse, J.-F. Puget, Y . Reina, D. Rousseau, A. Salzburger, A. Ustyuzhanin, J.-R. Vlimant, J. S. Wind, T. Xylouris, and Y . Yilmaz,The trackin...

  19. [19]

    Kasieczkaet al., Rept

    G. Kasieczka et al.,The LHC Olympics 2020 a community challenge for anomaly detection in high energy physics, Rept. Prog. Phys.84(2021) 124201, arXiv:2101.08320 [hep-ph]

  20. [20]

    Z. Liu, A. Pavao, Z. Xu, S. Escalera, F. Ferreira, I. Guyon, S. Hong, F. Hutter, R. Ji, J. C. S. J. Junior, G. Li, M. Lindauer, Z. Luo, M. Madadi, T. Nierhoff, K. Niu, C. Pan, D. Stoll, S. Treguer, J. Wang, P. Wang, C. Wu, Y . Xiong, A. Zela, and Y . Zhang,Winning solutions and post-challenge analyses of the chalearn autodl challenge 2019, IEEE Transactio...

  21. [21]

    A. E. Baz, I. Ullah, and etal,Lessons learned from the neurips 2021 metadl challenge: Backbone fine-tuning without episodic meta-learning dominates for few-shot learning image classification, PMLR (2022, to appear)

  22. [22]

    Carrión-Ojeda, H

    D. Carrión-Ojeda, H. Chen, A. E. Baz, S. Escalera, C. Guan, I. Guyon, I. Ullah, X. Wang, and W. Zhu,Neurips’22 cross-domain metadl competition: Design and baseline results, 2022

  23. [23]

    Guyon, G

    I. Guyon, G. Dror, V . Lemaire, D. L. Silver, G. Taylor, and D. W. Aha,Analysis of the ijcnn 2011 utl challenge, Neural Networks32(2012) 174. 14

  24. [24]

    M. L. Danula Hettiachchi,Crowd bias challenge, 2021. https://kaggle.com/competitions/crowd-bias-challenge

  25. [25]

    S. P. Federica Proietto, Giovanni Bellitto,Ccai@unict 2023, 2023. https://kaggle.com/competitions/ccaiunict-2023

  26. [26]

    Y . Feng, S. Bird, L. Anderson, A. Font-Ribera, and C. Pedersen,Mp-gadget/mp-gadget: A tag for getting a doi, Oct., 2018.https://doi.org/10.5281/zenodo.1451799

  27. [27]

    Feng, M.-Y

    Y . Feng, M.-Y . Chu, U. Seljak, and P. McDonald,FastPM: a new scheme for fast simulations of dark matter and haloes, Mon. Not. Roy. Astron. Soc.463(2016) 2273, arXiv:1603.00476 [astro-ph.CO]

  28. [28]

    Li et al.,The three-year shear catalog of the Subaru Hyper Suprime-Cam SSP Survey, Publ

    X. Li et al.,The three-year shear catalog of the Subaru Hyper Suprime-Cam SSP Survey, Publ. Astron. Soc. Jap.74(2022) 421, arXiv:2107.00136 [astro-ph.CO]

  29. [29]

    M. M. Rau, R. Dalal, T. Zhang, X. Li, A. J. Nishizawa, S. More, R. Mandelbaum, H. Miyatake, M. A. Strauss, and M. Takada,Weak lensing tomographic redshift distribution inference for the hyper suprime-cam subaru strategic program three-year shape catalogue, Monthly Notices of the Royal Astronomical Society524(2023) 5109

  30. [30]

    Carlson and M

    J. Carlson and M. White,Embedding realistic surveys in simulations through volume remapping, The Astrophysical Journal Supplement Series190(2010) 311

  31. [31]

    B. Jain, U. Seljak, and S. White,Ray-tracing simulations of weak lensing by large-scale structure, The Astrophysical Journal530(2000) 547

  32. [32]

    Hilbert, J

    S. Hilbert, J. Hartlap, S. White, and P. Schneider,Ray-tracing through the millennium simulation: Born corrections and lens-lens coupling in cosmic shear and galaxy-galaxy lensing, Astronomy & Astrophysics499(2009) 31

  33. [33]

    Petri,Mocking the weak lensing universe: The LensTools Python computing package, Astronomy and Computing17(Oct., 2016) 73, arXiv:1606.01903 [astro-ph.CO]

    A. Petri,Mocking the weak lensing universe: The LensTools Python computing package, Astronomy and Computing17(Oct., 2016) 73, arXiv:1606.01903 [astro-ph.CO]

  34. [34]

    Petri, Z

    A. Petri, Z. Haiman, and M. May,Sample variance in weak lensing: how many simulations are required?, Physical Review D93(2016) 063524

  35. [35]

    X. Li, T. Zhang, S. Sugiyama, R. Dalal, R. Terasawa, M. M. Rau, R. Mandelbaum, M. Takada, S. More, M. A. Strauss, et al.,Hyper suprime-cam year 3 results: Cosmology from cosmic shear two-point correlation functions, Physical Review D108(2023) 123518

  36. [36]

    Dalal, X

    R. Dalal, X. Li, A. Nicola, J. Zuntz, M. A. Strauss, S. Sugiyama, T. Zhang, M. M. Rau, R. Mandelbaum, M. Takada, et al.,Hyper suprime-cam year 3 results: Cosmology from cosmic shear power spectra, Physical Review D108(2023) 123519

  37. [37]

    Abbott, M

    T. Abbott, M. Aguena, A. Alarcon, S. Allam, O. Alves, A. Amon, F. Andrade-Oliveira, J. Annis, S. Avila, D. Bacon, et al.,Dark energy survey year 3 results: Cosmological constraints from galaxy clustering and weak lensing, Physical Review D105(2022) 023520

  38. [38]

    Asgari, C.-A

    M. Asgari, C.-A. Lin, B. Joachimi, B. Giblin, C. Heymans, H. Hildebrandt, A. Kannawadi, B. Stölzner, T. Tröster, J. L. van den Busch,et al.,Kids-1000 cosmology: Cosmic shear constraints and comparison between two point statistics, Astronomy & Astrophysics645 (2021) A104

  39. [39]

    Sharma, B

    D. Sharma, B. Dai, F. Villaescusa-Navarro, and U. Seljak,A field-level emulator for modelling baryonic effects across hydrodynamic simulations, Monthly Notices of the Royal Astronomical Society538(2025) 1415

  40. [40]

    A. Mead, S. Brieden, T. Tröster, and C. Heymans,Hmcode-2020: Improved modelling of non-linear cosmological power spectra with baryonic feedback, Monthly Notices of the Royal Astronomical Society502(2021) 1401

  41. [41]

    Amon and G

    A. Amon and G. Efstathiou,A non-linear solution to the s 8 tension?, Monthly Notices of the Royal Astronomical Society516(2022) 5355. 15

  42. [42]

    J. C. de Janvry, B. Dai, S. Gontcho, U. Seljak, and T. Zhang,Cosmic shear constraints from hsc year 3 with clustering calibration of the tomographic redshift distributions from desi, arXiv preprint arXiv:2511.18134 (2025)

  43. [43]

    K. Diao, B. Dai, and U. Seljak,Detecting modeling bias with continuous time flow models on weak lensing maps, JCAP08(2025) 004, arXiv:2505.00632 [astro-ph.CO]

  44. [44]

    Paszke, S

    A. Paszke, S. Gross, F. Massa, A. Lerer, J. Bradbury, G. Chanan, T. Killeen, Z. Lin, N. Gimelshein, L. Antiga, A. Desmaison, A. Köpf, E. Yang, Z. DeVito, M. Raison, A. Tejani, S. Chilamkurthy, B. Steiner, L. Fang, J. Bai, and S. Chintala,Pytorch: An imperative style, high-performance deep learning library, 2019. 16