LEGGOS III: Mapping Star Formation and Dust in Gravitationally Lensed Galaxies with textit{SUMAC}, a UMAP and Clustering Framework
Pith reviewed 2026-06-26 16:25 UTC · model grok-4.3
The pith
An unsupervised pipeline segments JWST spaxel SEDs of a lensed galaxy into six physically distinct stellar and nebular populations.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The SUMAC pipeline combines UMAP manifold embedding with HDBSCAN density clustering applied to spaxel spectral energy distributions, recovering six physically distinct stellar/nebular populations. The cluster median SEDs separate cleanly on the presence and strength of Hβ+[OIII], Hα+[NII], β_NUV slope, Balmer break strength, and the Balmer decrement, with bluer clusters tracing unobscured star-forming regions and progressively redder clusters tracing dusty star-forming regions.
What carries the argument
SUMAC: UMAP-based manifold embedding combined with HDBSCAN density clustering applied to spectral energy distributions at the spaxel level.
If this is right
- The method automates identification of star-forming clumps in lensed JWST IFS data, reducing dependence on manual inspection.
- Cluster separation shows that SED shape alone can distinguish unobscured from dusty star formation without additional priors.
- The pipeline can be applied uniformly to other lensed galaxies at z approximately 2-4 observed with similar instruments.
- Median SEDs of the recovered clusters supply empirical templates for stellar and nebular properties in high-redshift systems.
Where Pith is reading between the lines
- Testing the same clustering on non-lensed or lower-resolution data would check whether the six-population structure persists outside strong lensing.
- Adding longer-wavelength photometry could tighten constraints on dust content within the redder clusters.
- The framework might be used to measure the obscured fraction of star formation across a statistical sample of z approximately 2.5 galaxies.
- Comparing cluster assignments against hydrodynamic simulations of galaxy assembly could test whether the recovered populations correspond to distinct evolutionary stages.
Load-bearing premise
The clusters produced by UMAP and HDBSCAN on spaxel SEDs represent physically distinct populations rather than artifacts of the embedding or clustering hyperparameters.
What would settle it
Re-running the pipeline on the same data with altered UMAP parameters or a different clustering algorithm produces clusters whose median SEDs no longer separate on emission lines, UV slope, Balmer features, or dust indicators.
Figures
read the original abstract
Strong gravitational lensing combined with JWST's spatio-spectral resolution enables resolved studies of star-forming regions in $z\sim$ 2-4 galaxies, but identifying and characterizing such regions in lensed integral-field and multi-band data remains a manual, observer-dependent process. We present $\texttt{SUMAC}$ (Software for the Uniform Manifold Approximation of Clumps), an unsupervised learning pipeline that segments JWST imaging and spectroscopy at the "spaxel" level by combining $\texttt{UMAP}$-based manifold embedding with $\texttt{HDBSCAN}$ density clustering applied to spectral energy distributions/spectra. We demonstrate the pipeline on JWST/NIRSpec PRISM IFS observations of the lensed galaxy SGAS111020.0+645950.8 at $z = 2.481$, recovering six physically distinct stellar/nebular populations. The cluster median SEDs separate cleanly on the presence and strength of H$\beta$+[OIII], H$\alpha$+[NII], $\beta_{NUV}$ slope, Balmer break strength, and the Balmer decrement, with bluer clusters tracing unobscured star-forming regions and progressively redder clusters tracing dusty star-forming regions.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript presents SUMAC, an unsupervised pipeline applying UMAP manifold embedding followed by HDBSCAN density clustering directly to spaxel SED vectors from JWST/NIRSpec PRISM IFS observations of the lensed galaxy SGAS111020.0+645950.8 at z=2.481. It claims to recover six physically distinct stellar/nebular populations whose median SEDs separate cleanly on the presence/strength of Hβ+[OIII], Hα+[NII], β_NUV slope, Balmer break, and Balmer decrement, with bluer clusters tracing unobscured star-forming regions and redder clusters tracing dusty ones.
Significance. If the clusters map to physically distinct populations, the method would supply an objective, automated alternative to manual segmentation of resolved star-forming regions in gravitationally lensed z~2-4 galaxies, reducing observer dependence and enabling statistical studies of dust and star formation with JWST IFS data.
major comments (2)
- [Abstract] Abstract: the central claim that the six clusters correspond to physically distinct populations rests solely on post-hoc qualitative inspection of median SED separations in emission-line and continuum features; no error bars on the medians, silhouette scores or other validation metrics, comparison against manual segmentation, or recovery tests on simulated data with known ground-truth populations are reported.
- [Methods] Pipeline description: no systematic sweeps of UMAP (n_neighbors, min_dist) or HDBSCAN (min_cluster_size, min_samples) hyperparameters are presented, nor are alternative embeddings (PCA, t-SNE) or robustness checks across different random seeds or data subsets; without these, the clean six-cluster solution and the reported SED separations could be induced by the chosen embedding parameters rather than reflecting intrinsic physical components.
minor comments (1)
- Notation for β_NUV and the Balmer decrement should be defined explicitly on first use and used consistently in all figure captions and text.
Simulated Author's Rebuttal
We thank the referee for their constructive comments, which have helped us identify areas where the manuscript can be strengthened. We address each major comment below and outline the revisions we will make.
read point-by-point responses
-
Referee: [Abstract] Abstract: the central claim that the six clusters correspond to physically distinct populations rests solely on post-hoc qualitative inspection of median SED separations in emission-line and continuum features; no error bars on the medians, silhouette scores or other validation metrics, comparison against manual segmentation, or recovery tests on simulated data with known ground-truth populations are reported.
Authors: We agree that the current abstract and results section rely primarily on qualitative assessment of the median SED separations. In the revised manuscript we will add error bars to all median SED plots, report silhouette scores (and optionally Davies-Bouldin index) for the six-cluster solution, and include a direct comparison between the SUMAC clusters and a manual segmentation performed by the authors. Full recovery tests on simulated NIRSpec PRISM data with injected ground-truth populations are a valuable next step but would require a separate simulation framework and are beyond the scope of this demonstration paper; we will explicitly note this limitation and flag it for future work. revision: partial
-
Referee: [Methods] Pipeline description: no systematic sweeps of UMAP (n_neighbors, min_dist) or HDBSCAN (min_cluster_size, min_samples) hyperparameters are presented, nor are alternative embeddings (PCA, t-SNE) or robustness checks across different random seeds or data subsets; without these, the clean six-cluster solution and the reported SED separations could be induced by the chosen embedding parameters rather than reflecting intrinsic physical components.
Authors: We acknowledge the absence of these robustness checks. The revised Methods section will contain a new subsection (or appendix) that systematically varies the key UMAP (n_neighbors, min_dist) and HDBSCAN (min_cluster_size, min_samples) parameters over plausible ranges and shows that the six-cluster solution and the associated SED separations remain stable. We will also present results using PCA as an alternative linear embedding and report clustering outcomes for multiple random seeds as well as for data subsets (e.g., different rest-frame wavelength ranges and spatial masks). These additions will demonstrate that the reported populations are not artifacts of a single hyperparameter choice. revision: yes
Circularity Check
No circularity; standard unsupervised pipeline with independent validation
full rationale
The paper applies UMAP manifold embedding followed by HDBSCAN clustering to spaxel SED vectors as a standard, off-the-shelf unsupervised segmentation method. The central claim that the resulting six clusters map to physically distinct populations rests on post-hoc inspection of median SED differences in emission lines and continuum features, not on any equation, fitted parameter, or self-citation that reduces the output to the input by construction. No derivation chain, uniqueness theorem, or ansatz is invoked; the method is externally falsifiable via hyperparameter sweeps or alternative embeddings, none of which are required for the absence of circularity. This is the normal case of a self-contained empirical application.
Axiom & Free-Parameter Ledger
free parameters (2)
- UMAP n_neighbors and min_dist
- HDBSCAN min_cluster_size and min_samples
axioms (1)
- domain assumption Spaxel-level SEDs contain sufficient manifold structure to separate physically distinct stellar/nebular populations via density-based clustering.
Reference graph
Works this paper leans on
-
[1]
Accelerated Hierarchical Density Based Clustering , url=
McInnes, Leland and Healy, John , year=. Accelerated Hierarchical Density Based Clustering , url=. doi:10.1109/icdmw.2017.12 , booktitle=
-
[2]
Rigby, J. R. and Johnson, T. L. and Sharon, K. and Whitaker, K. and Gladders, M. D. and Florian, M. and Lotz, J. and Bayliss, M. and Wuyts, E. , year=. Star Formation at z = 2.481 in the Lensed Galaxy SDSS J1110+6459. II. What is Missed at the Normal Resolution of the Hubble Space Telescope? , volume=. The Astrophysical Journal , publisher=. doi:10.3847/1...
-
[3]
Dust Extinction of the Stellar Continua in Starburst Galaxies: The Ultraviolet and Optical Extinction Law. , keywords =. doi:10.1086/174346 , adsurl =
-
[4]
JWST Early Release Science Program TEMPLATES: Targeting Extremely Magnified Panchromatic Lensed Arcs and Their Extended Star Formation. , keywords =. doi:10.3847/1538-4357/ad7501 , archivePrefix =. 2312.10465 , primaryClass =
-
[5]
2020 , eprint=
UMAP: Uniform Manifold Approximation and Projection for Dimension Reduction , author=. 2020 , eprint=
2020
-
[6]
<scp>capivara</scp>: a spectral-based segmentation method for IFU data cubes , volume=
de Souza, Rafael and Dahmer-Hahn, Luis G and Shen, Shiyin and Chies-Santos, Ana L and Chen, Mi and Rahna, P T and Coelho, Paula and Riffel, Rogério and Ye, Renhao and Tahmasebzadeh, Behzad , year=. <scp>capivara</scp>: a spectral-based segmentation method for IFU data cubes , volume=. Monthly Notices of the Royal Astronomical Society , publisher=. doi:10....
-
[7]
2026 , eprint=
SAGUI: SED-based Segmentation of Multi-band Galaxy Images -- Application to JADES in GOODS-South , author=. 2026 , eprint=
2026
-
[8]
Nielsen, Emma W. and Steinhardt, Charles L. and Harper, Mathieux and McPartland, Conor and Sedgewick, Aidan , year=. Evidence for multiple types of post-starburst galaxies , volume=. doi:10.1051/0004-6361/202554507 , journal=
-
[9]
Rosito, M. S. and Bignone, L. A. and Tissera, P. B. and Pedrosa, S. E. , year=. Application of dimensionality reduction and clustering algorithms for the classification of kinematic morphologies of galaxies , volume=. doi:10.1051/0004-6361/202244707 , journal=
-
[10]
Johnson, Traci L. and Rigby, Jane R. and Sharon, Keren and Gladders, Michael D. and Florian, Michael and Bayliss, Matthew B. and Wuyts, Eva and Whitaker, Katherine E. and Livermore, Rachael and Murray, Katherine T. , year=. Star Formation at z = 2.481 in the Lensed Galaxy SDSS J1110+6459: Star Formation Down to 30 pc Scales<sup>∗</sup> , volume=. The Astr...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.