A machine-learning photometric classifier for massive stars in nearby galaxies II. The catalog

A. Z. Bonanos; E. Christodoulou; E. Zapartas; F. Tramper; G. Maravelias; G. Mu\~noz-Sanchez; K. Antoniadis; K. Kovlakas; P. Bonfini; S. Avgousti

arxiv: 2504.01232 · v2 · pith:ZUOC6WSNnew · submitted 2025-04-01 · 🌌 astro-ph.GA · astro-ph.SR

A machine-learning photometric classifier for massive stars in nearby galaxies II. The catalog

G. Maravelias , A. Z. Bonanos , K. Antoniadis , G. Mu\~noz-Sanchez , E. Christodoulou , S. de Wit , E. Zapartas , K. Kovlakas

show 3 more authors

F. Tramper P. Bonfini S. Avgousti

This is my paper

Pith reviewed 2026-05-22 21:20 UTC · model grok-4.3

classification 🌌 astro-ph.GA astro-ph.SR

keywords machine learningphotometric classificationred supergiantsmassive starsnearby galaxiesstellar catalogyellow hypergiantsmetallicity trends

0 comments

The pith

Machine learning applied to photometry classifies 1.15 million sources across 26 galaxies within 5 Mpc, yielding 120k red supergiants and several objects exceeding known luminosity limits.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper applies a photometric machine-learning classifier, trained on Spitzer mid-infrared and Pan-STARRS1 optical data, to point sources in 26 galaxies spanning metallicities from 0.07 to 1.36 solar. It produces classifications for 1,147,650 sources, of which 276,657 are rated robust, including 120,479 red supergiants plus 21 luminous and 6 extreme red supergiants in M31. The work also flags 159 dusty yellow hypergiants in M31 and M33 and supplies the largest spectroscopically confirmed sample of extragalactic massive stars beyond the Magellanic Clouds. A reader would care because the catalog supplies targets for detailed follow-up on episodic mass loss and supplies a statistical base for studying how massive-star populations change with metallicity.

Core claim

The central claim is that the machine-learning model from the preceding paper can be applied at scale to Spitzer and optical photometry to deliver reliable classifications of evolved massive stars, producing a catalog of 1,147,650 sources (276,657 robust) that includes 120,479 red supergiants, 21 objects with log(L/L⊙) ≥ 5.5, 159 dusty yellow hypergiants, and 6 extreme red supergiants with log(L/L⊙) ≥ 6 in M31, while performance remains good down to ~0.1 Z⊙ and out to ~1.5 Mpc.

What carries the argument

The machine-learning photometric classifier trained on combined Spitzer and Pan-STARRS1 photometry that assigns evolutionary classes to point sources after Gaia foreground removal.

If this is right

The catalog supplies targets for James Webb Space Telescope observations of luminous red supergiants and yellow hypergiants.
Follow-up spectroscopy of the flagged extreme objects can test whether current luminosity limits for red supergiants need revision.
Trends of class fractions with metallicity can be measured across the 0.07–1.36 Z⊙ range covered by the 26 galaxies.
The 5,273 spectroscopically confirmed massive-star candidates form the largest such extragalactic sample beyond the Clouds for statistical studies.
Individual-object studies of mass-loss episodes become feasible for the newly identified luminous and dusty sources.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Extending the same photometry-plus-ML approach to future wide-field infrared surveys could enlarge the sample to tens of millions of sources once resolution limits are overcome.
The six extreme red supergiants in M31 offer a direct test case for whether binary interaction or other channels allow stars to exceed the single-star luminosity ceiling.
Cross-matching the catalog with time-domain surveys would enable statistical measurement of variability amplitudes as a function of evolutionary class and metallicity.
The catalog's distance and metallicity coverage supplies a ready benchmark set for testing next-generation stellar-evolution codes that include episodic mass loss.

Load-bearing premise

The machine-learning model trained on higher-metallicity data continues to assign accurate classes at metallicities as low as 0.1 solar and distances up to a few megaparsecs.

What would settle it

Spectroscopic observations of the 6 extreme red supergiants and a random subset of the 120k red-supergiant candidates to test whether their luminosities and spectral types match the photometric assignments.

read the original abstract

Mass loss is a key aspect of stellar evolution, particularly in evolved massive stars, yet episodic mass loss remains poorly understood. To investigate this, we need evolved massive stellar populations across various galactic environments. However, spectral classifications are challenging to obtain in large numbers, especially for distant galaxies. We addressed this by leveraging machine-learning techniques. We combined Spitzer photometry and Pan-STARRS1 optical data to classify point sources in 26 galaxies within 5 Mpc, and a metallicity range 0.07-1.36 Z$_\odot$. Gaia data release 3 (DR3) astrometry was used to remove foreground sources. Classifications are derived using a machine-learning model developed in our previous work. We report classifications for 1,147,650 sources, with 276,657 sources (~24%) being robust. Among these are 120,479 red supergiants (RSGs; ~11%). The classifier performs well even at low metallicities (~0.1 Z$_\odot$) and distances under 1.5 Mpc, with a slight decrease in accuracy beyond ~3 Mpc due to Spitzer's resolution limits. We also identified 21 luminous RSGs (log($L/L_\odot)\ge5.5$), 159 dusty yellow hypergiants in M31 and M33, as well as 6 extreme RSGs (log($L/L_\odot)\ge6$) in M31, challenging observed luminosity limits. Class trends with metallicity align with expectations, although biases exist. This catalog serves as a valuable resource for individual-object studies and James Webb Space Telescope target selection. It enables the follow-up on luminous RSGs and yellow hypergiants to refine our understanding of their evolutionary pathways. Additionally, we provide the largest spectroscopically confirmed catalog of extragalactic massive stars and candidates to date, beyond the Clouds, comprising 5,273 sources (including ~330 other objects).

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

This is a catalog paper that scales up the authors' prior ML classifier across 26 galaxies to give over a million classifications and useful counts of red supergiants, but the standout claims on extreme objects lack fresh quantitative validation in this work.

read the letter

The main takeaway is that this paper applies the machine-learning classifier from the authors' earlier work to Spitzer and Pan-STARRS photometry in 26 galaxies within 5 Mpc. It produces classifications for 1.15 million sources, flags 277k as robust, and reports 120k red supergiants plus smaller numbers of luminous and extreme objects, including six extreme RSGs in M31 and 159 dusty yellow hypergiants in M31 and M33. Gaia DR3 cleaning removes foreground contaminants, and the output includes a spectroscopically confirmed subset of over 5k sources.

Referee Report

3 major / 2 minor

Summary. The paper applies an ML photometric classifier from a prior work (Part I) to combined Spitzer and Pan-STARRS1 photometry of point sources in 26 galaxies (0.07–1.36 Z⊙, <5 Mpc), after Gaia DR3 foreground removal. It reports classifications for 1,147,650 sources (276,657 robust, including 120,479 RSGs), plus 21 luminous RSGs (log L/L⊙ ≥5.5), 159 dusty YHGs in M31/M33, and 6 extreme RSGs (log L/L⊙ ≥6) in M31 that challenge observed limits; it also supplies a spectroscopically confirmed catalog of 5,273 extragalactic massive-star candidates. Performance is asserted to hold at low metallicity and <1.5 Mpc with only slight degradation beyond ~3 Mpc.

Significance. If the classifications prove reliable, the catalog would constitute a substantial resource for massive-star population studies across metallicities, enabling targeted follow-up of rare luminous objects and JWST target selection, while providing the largest spectroscopically confirmed sample beyond the Magellanic Clouds. The scale (over a million sources, ~24% robust) and explicit identification of extreme objects are strengths, but the absence of stratified quantitative metrics prevents full assessment of whether these results hold.

major comments (3)

[Abstract] Abstract (performance paragraph) and §3 (or equivalent methods/results section): the claim that the classifier 'performs well even at low metallicities (~0.1 Z⊙)' and distances <1.5 Mpc supplies no stratified accuracy, precision, recall, or confusion-matrix values for the RSG or YHG classes in the low-Z or distant regimes; without these, the reliability of the 276,657 robust classifications and the 6 extreme RSGs cannot be evaluated.
[Abstract] Abstract and results on extreme objects: the identification of 6 extreme RSGs (log L/L⊙ ≥6) in M31 and 21 luminous RSGs rests on application of the Part I model without reported validation metrics or bias analysis for the high-luminosity tail; this directly affects the claim that these objects challenge observed luminosity limits.
[Discussion] Discussion of class trends with metallicity: the statement that 'class trends with metallicity align with expectations, although biases exist' is not accompanied by quantitative assessment of selection biases or completeness as a function of metallicity or distance, which is load-bearing for interpreting the 120,479 RSGs and overall catalog statistics.

minor comments (2)

[Methods] Clarify the exact criteria used to designate the 276,657 sources as 'robust' (probability threshold, agreement across bands, etc.) in the methods or catalog description section.
[Results] Provide a table or figure showing the distribution of classifications by galaxy and metallicity bin to support the trend statements.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for their constructive and detailed comments, which highlight important areas for strengthening the quantitative support of our claims. We address each major comment below and will revise the manuscript to incorporate the requested analyses and clarifications.

read point-by-point responses

Referee: [Abstract] Abstract (performance paragraph) and §3 (or equivalent methods/results section): the claim that the classifier 'performs well even at low metallicities (~0.1 Z⊙)' and distances <1.5 Mpc supplies no stratified accuracy, precision, recall, or confusion-matrix values for the RSG or YHG classes in the low-Z or distant regimes; without these, the reliability of the 276,657 robust classifications and the 6 extreme RSGs cannot be evaluated.

Authors: We agree that stratified performance metrics are necessary for a full evaluation. The current manuscript reports overall performance based on the Part I model and the spectroscopically confirmed subsample, but does not break it down by metallicity or distance bins. In the revised version we will add a dedicated subsection (or table) in §3 presenting accuracy, precision, recall, and confusion matrices for the RSG and YHG classes, stratified by metallicity bins (e.g., <0.2, 0.2–0.5, 0.5–1.0, >1.0 Z⊙) and distance bins (<1.5 Mpc, 1.5–3 Mpc, >3 Mpc), using the 5,273 confirmed sources where possible. revision: yes
Referee: [Abstract] Abstract and results on extreme objects: the identification of 6 extreme RSGs (log L/L⊙ ≥6) in M31 and 21 luminous RSGs rests on application of the Part I model without reported validation metrics or bias analysis for the high-luminosity tail; this directly affects the claim that these objects challenge observed luminosity limits.

Authors: The extreme and luminous RSGs are identified by applying the Part I classifier and then computing luminosities from the combined photometry. While Part I validated the model on a sample containing luminous stars, we did not provide a dedicated bias or completeness analysis for the high-luminosity tail in this work. We will revise the results and discussion sections to include such an analysis, drawing on the spectroscopic subsample and literature comparisons to quantify potential contamination or selection effects at log L/L⊙ ≥5.5. revision: yes
Referee: [Discussion] Discussion of class trends with metallicity: the statement that 'class trends with metallicity align with expectations, although biases exist' is not accompanied by quantitative assessment of selection biases or completeness as a function of metallicity or distance, which is load-bearing for interpreting the 120,479 RSGs and overall catalog statistics.

Authors: We acknowledge that the current discussion of metallicity trends is qualitative. In the revised manuscript we will expand this section with quantitative estimates of selection biases and completeness as functions of metallicity and distance. These will be derived from the distribution of the spectroscopically confirmed sample and from simple completeness simulations that account for photometric depth and Spitzer resolution limits. revision: yes

Circularity Check

0 steps flagged

Catalog is new application of prior ML model to independent photometry; minor self-citation on performance not load-bearing

full rationale

The paper derives its catalog by applying the machine-learning classifier developed in the authors' prior work (Part I) to fresh Spitzer + Pan-STARRS1 photometry across 26 galaxies, after Gaia DR3 foreground removal. The reported source counts, robust classifications (~24%), RSG numbers (~11%), and specific extreme-object identifications (21 luminous RSGs, 159 dusty YHGs, 6 extreme RSGs) are direct outputs of this application to new data and are not equivalent to the training inputs by construction. The abstract's performance claim at low metallicity references the previous paper but does not redefine or force the current catalog results. No self-definitional loops, fitted-input predictions, or uniqueness theorems appear in the derivation chain. The work is therefore largely self-contained as an application study against external photometric benchmarks.

Axiom & Free-Parameter Ledger

1 free parameters · 2 axioms · 0 invented entities

The catalog rests on the generalization of the part-I classifier and standard assumptions about photometric data quality and foreground removal; no new physical entities are introduced.

free parameters (1)

ML classifier parameters
Trained on spectroscopic samples in the preceding paper; not re-derived here.

axioms (2)

domain assumption Gaia DR3 astrometry cleanly removes all foreground Milky Way contaminants from the Spitzer/Pan-STARRS1 point-source lists.
Invoked to produce the clean input catalog for classification.
domain assumption Spitzer 3.6/4.5 µm and Pan-STARRS1 g,r,i,z,y photometry together contain sufficient information to separate RSGs, yellow hypergiants, and other massive-star types at the metallicities and distances of the target galaxies.
Core premise enabling the machine-learning step.

pith-pipeline@v0.9.0 · 5950 in / 1632 out tokens · 49049 ms · 2026-05-22T21:20:49.753679+00:00 · methodology

A machine-learning photometric classifier for massive stars in nearby galaxies II. The catalog

Core claim

What carries the argument

If this is right

Where Pith is reading between the lines

Load-bearing premise

What would settle it

discussion (0)