Recognition: unknown
Predicting Redshift in Seyfert Galaxies Using Machine Learning
Pith reviewed 2026-05-10 03:23 UTC · model grok-4.3
The pith
Machine learning models using combined optical and mid-infrared colors can estimate photometric redshifts for Seyfert II galaxies with high accuracy.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Using a spectroscopically confirmed sample of 23,797 Seyfert II galaxies from SDSS cross-matched with WISE, the authors demonstrate that photometric redshift estimation via machine learning reaches NMAD = 0.0188, R² = 0.9561, and outlier fraction η = 0.294% when employing combined optical and mid-infrared broadband colors with a Random Forest regressor. This outperforms single-band regimes and shows that accuracy stems from the physical information in the features and the homogeneity of the sample, providing a scalable method for upcoming surveys.
What carries the argument
Random Forest regression applied to combined optical+MIR broadband color features, which encodes the spectral energy distribution across wavelengths to infer redshift without spectra.
Load-bearing premise
The spectroscopically selected training sample remains representative of the photometric population in future surveys, and the broadband colors capture redshift information without strong selection biases or redshift-dependent changes in galaxy properties.
What would settle it
Applying the trained model to an independent set of photometrically selected Seyfert galaxies from a new survey and finding substantially higher outlier fractions or worse NMAD values would indicate the claim does not hold.
read the original abstract
Photometric redshift estimation is a key requirement for modern large-area surveys, where spectroscopic measurements are observationally prohibitive. Seyfert II galaxies provide a particularly challenging test case due to the combined effects of nuclear activity, host-galaxy emission, and dust attenuation. In this work, we develop a machine learning approach for photometric redshift estimation using a spectroscopically defined sample of 23,797 Seyfert II galaxies selected from SDSS and cross-matched with WISE. We construct feature sets based on optical, mid-infrared (MIR), and combined optical+MIR broadband colours, and evaluate their performance using different regression models. The best results are obtained with the combined Optical+MIR features and a Random Forest model, reaching NMAD = 0.0188, R 2 = 0.9561, and an outlier fraction of {\eta} = 0.294%. The results show that the accuracy is primarily driven by the physical information content of the features and the homogeneity of the sample. The method provides a robust and scalable solution for photometric redshift estimation in upcoming wide-field surveys.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript presents a machine learning approach for photometric redshift estimation in Seyfert II galaxies. A sample of 23,797 spectroscopically confirmed objects is drawn from SDSS and cross-matched with WISE photometry. Regression models are trained on optical, mid-infrared, and combined broadband color feature sets. The best reported performance is obtained with a Random Forest model using the combined optical+MIR features, yielding NMAD = 0.0188, R² = 0.9561, and outlier fraction η = 0.294%. The authors conclude that accuracy is driven by feature information content and sample homogeneity, and that the method offers a robust, scalable solution for upcoming wide-field surveys.
Significance. If the reported performance generalizes beyond the spectroscopically selected training distribution, the work would provide a practical tool for redshift estimation in large AGN surveys where spectroscopy is prohibitive. The emphasis on combined optical+MIR colors addresses a known challenge for active galaxies, and the empirical metrics on the held-out spectroscopic sample are competitive. However, the absence of external validation limits the strength of the scalability claim.
major comments (2)
- [Methods] Methods section (model training and evaluation): The manuscript provides no explicit description of the train-test splitting procedure, cross-validation strategy, or hyperparameter optimization for the Random Forest (or other models). Without these details it is impossible to assess whether the quoted metrics (NMAD = 0.0188, R² = 0.9561) are stable or over-optimistic.
- [Results and Discussion] Results/Discussion (scalability claim): The assertion that the method is 'robust and scalable' for future photometric surveys rests on the untested assumption that the SDSS spectroscopic selection function does not introduce biases that degrade performance on purely photometric samples. No magnitude-stratified performance tests, no comparison against an independent photometric Seyfert catalog, and no simulation of altered completeness functions are reported, which directly undermines the central claim of applicability to wide-field surveys.
minor comments (2)
- [Abstract] Abstract: 'R 2' should be rendered as R²; the outlier fraction symbol should be introduced as η on first use.
- [Figures and Tables] Figure captions and tables: Ensure all performance metrics are accompanied by uncertainty estimates or bootstrap errors where feasible.
Simulated Author's Rebuttal
We thank the referee for the careful and constructive review of our manuscript. The comments have prompted us to clarify key methodological details and to moderate our claims regarding scalability. We address each major comment below.
read point-by-point responses
-
Referee: [Methods] Methods section (model training and evaluation): The manuscript provides no explicit description of the train-test splitting procedure, cross-validation strategy, or hyperparameter optimization for the Random Forest (or other models). Without these details it is impossible to assess whether the quoted metrics (NMAD = 0.0188, R² = 0.9561) are stable or over-optimistic.
Authors: We agree that these procedural details were omitted from the original submission. In the revised manuscript we have expanded the Methods section to explicitly describe the random 80/20 train-test split, the 5-fold cross-validation used for hyperparameter tuning via grid search, and the final Random Forest hyperparameters (n_estimators=100, max_depth=20, min_samples_split=5). These additions demonstrate that the reported metrics were obtained through standard, reproducible practices. revision: yes
-
Referee: [Results and Discussion] Results/Discussion (scalability claim): The assertion that the method is 'robust and scalable' for future photometric surveys rests on the untested assumption that the SDSS spectroscopic selection function does not introduce biases that degrade performance on purely photometric samples. No magnitude-stratified performance tests, no comparison against an independent photometric Seyfert catalog, and no simulation of altered completeness functions are reported, which directly undermines the central claim of applicability to wide-field surveys.
Authors: We partially agree. The original claim was based on the homogeneity of the spectroscopically confirmed Seyfert II sample and the information gain from combined optical+MIR colors. We acknowledge the absence of external validation. In revision we have added a dedicated limitations paragraph, reported magnitude-stratified NMAD and outlier fractions on the held-out test set (showing stable performance across bins), and revised the abstract and conclusions to state that the approach 'shows promise for wide-field surveys subject to further validation on photometric samples'. revision: partial
Circularity Check
No circularity; standard empirical ML evaluation on held-out data
full rationale
The paper trains regression models (including Random Forest) on broadband optical and MIR colors derived from SDSS+WISE photometry to predict spectroscopic redshifts for a sample of 23,797 Seyfert II galaxies. Performance metrics (NMAD=0.0188, R²=0.9561, outlier fraction 0.294%) are computed on a held-out test portion of the same spectroscopically confirmed sample. This constitutes a direct empirical measurement of generalization error within the training distribution. No algebraic derivation, self-referential fitting, or self-citation chain reduces the quoted metrics to quantities defined by the fit itself. The central claim is an observed performance number on independent test data, not a tautology or renamed input.
Axiom & Free-Parameter Ledger
free parameters (1)
- Random Forest hyperparameters
axioms (1)
- domain assumption The spectroscopically confirmed Seyfert II sample is homogeneous and representative of the photometric population
Reference graph
Works this paper leans on
-
[1]
Random Forests. Machine Learning , keywords =. doi:10.1023/A:1010933404324 , adsurl =
-
[2]
XGBoost: A Scalable Tree Boosting System
XGBoost: A Scalable Tree Boosting System. arXiv e-prints , keywords =. doi:10.48550/arXiv.1603.02754 , archivePrefix =. 1603.02754 , primaryClass =
-
[3]
Machine learning applications in studies of the physical properties of active galactic nuclei based on photometric observations. , keywords =. doi:10.1051/0004-6361/202346557 , archivePrefix =. 2303.18076 , primaryClass =
-
[4]
The Nineteenth Data Release of the Sloan Digital Sky Survey. arXiv e-prints , keywords =. doi:10.48550/arXiv.2507.07093 , archivePrefix =. 2507.07093 , primaryClass =
-
[5]
Optimal Filter Systems for Photometric Redshift Estimation. , keywords =. doi:10.1088/0004-637X/692/1/L5 , archivePrefix =. 0812.3568 , primaryClass =
-
[6]
The Wavelength Dependence of Interstellar Extinction from 1.25 to 8.0 m Using GLIMPSE Data. , keywords =. doi:10.1086/426679 , archivePrefix =. astro-ph/0406403 , primaryClass =
-
[7]
WISE Photometry for 400 Million SDSS Sources. , keywords =. doi:10.3847/0004-6256/151/2/36 , archivePrefix =. 1410.7397 , primaryClass =
-
[8]
The Wide-field Infrared Survey Explorer (WISE): Mission Description and Initial On-orbit Performance. , keywords =. doi:10.1088/0004-6256/140/6/1868 , archivePrefix =. 1008.0031 , primaryClass =
work page internal anchor Pith review Pith/arXiv arXiv doi:10.1088/0004-6256/140/6/1868
-
[9]
Wide-Field InfrarRed Survey Telescope-Astrophysics Focused Telescope Assets WFIRST-AFTA 2015 Report
Wide-Field InfrarRed Survey Telescope-Astrophysics Focused Telescope Assets WFIRST-AFTA 2015 Report. arXiv e-prints , keywords =. doi:10.48550/arXiv.1503.03757 , archivePrefix =. 1503.03757 , primaryClass =
work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.1503.03757 2015
-
[10]
Euclid Definition Study Report
Euclid Definition Study Report. arXiv e-prints , keywords =. doi:10.48550/arXiv.1110.3193 , archivePrefix =. 1110.3193 , primaryClass =
work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.1110.3193
-
[11]
The Biases of Optical Line-Ratio Selection for Active Galactic Nuclei and the Intrinsic Relationship between Black Hole Accretion and Galaxy Star Formation. , keywords =. doi:10.1088/0004-637X/811/1/26 , archivePrefix =. 1501.02801 , primaryClass =
-
[12]
Alternative diagnostic diagrams and the `forgotten' population of weak line galaxies in the SDSS. , keywords =. doi:10.1111/j.1365-2966.2009.16185.x , archivePrefix =. 0912.1643 , primaryClass =
-
[13]
2009, ApJ, 690, 1236, doi: 10.1088/0004-637X/690/2/1236 Ivezi´ c,ˇZ., Kahn, S
Cosmos Photometric Redshifts with 30-Bands for 2-deg ^ 2. , keywords =. doi:10.1088/0004-637X/690/2/1236 , archivePrefix =. 0809.2101 , primaryClass =
-
[14]
Angular Clustering with Photometric Redshifts in the Sloan Digital Sky Survey: Bimodality in the Clustering Properties of Galaxies. , keywords =. doi:10.1086/377168 , archivePrefix =. astro-ph/0305603 , primaryClass =
-
[15]
K-Corrections and Filter Transformations in the Ultraviolet, Optical, and Near-Infrared. , keywords =. doi:10.1086/510127 , archivePrefix =. astro-ph/0606170 , primaryClass =
-
[16]
The host galaxies and classification of active galactic nuclei. , keywords =. doi:10.1111/j.1365-2966.2006.10859.x , archivePrefix =. astro-ph/0605681 , primaryClass =
-
[17]
Classification parameters for the emission-line spectra of extragalactic objects. , keywords =. doi:10.1086/130766 , adsurl =
-
[18]
The host galaxies of active galactic nuclei. , keywords =. doi:10.1111/j.1365-2966.2003.07154.x , archivePrefix =. astro-ph/0304239 , primaryClass =
-
[19]
Theoretical Modeling of Starburst Galaxies. , keywords =. doi:10.1086/321545 , archivePrefix =. astro-ph/0106324 , primaryClass =
work page internal anchor Pith review Pith/arXiv arXiv doi:10.1086/321545
-
[20]
The Molecular Wind in the Nearest Seyfert Galaxy Circinus Revealed by ALMA. , keywords =. doi:10.3847/0004-637X/832/2/142 , archivePrefix =. 1609.06316 , primaryClass =
-
[21]
Detailed Decomposition of Galaxy Images. II. Beyond Axisymmetric Models. , keywords =. doi:10.1088/0004-6256/139/6/2097 , archivePrefix =. 0912.0731 , primaryClass =
-
[22]
The many flavours of photometric redshifts. Nature Astronomy , keywords =. doi:10.1038/s41550-018-0478-0 , archivePrefix =. 1805.12574 , primaryClass =
-
[23]
Monthly Notices of the Royal Astronomical Society , author =
Photometric redshifts for the SDSS Data Release 12. , keywords =. doi:10.1093/mnras/stw1009 , archivePrefix =. 1603.09708 , primaryClass =
-
[24]
Random Forests for Photometric Redshifts. , keywords =. doi:10.1088/0004-637X/712/1/511 , adsurl =
-
[25]
ANNz: Estimating Photometric Redshifts Using Artificial Neural Networks. , keywords =. doi:10.1086/383254 , archivePrefix =. astro-ph/0311058 , primaryClass =
-
[26]
Photometric Redshift and Classification for the XMM-COSMOS Sources. , keywords =. doi:10.1088/0004-637X/690/2/1250 , archivePrefix =. 0809.2098 , primaryClass =
-
[27]
Colors of 2625 Quasars at 0<Z<5 Measured in the Sloan Digital Sky Survey Photometric System. , keywords =. doi:10.1086/320392 , archivePrefix =. astro-ph/0012449 , primaryClass =
-
[28]
Photometric Redshifts based on standard SED fitting procedures
Photometric redshifts based on standard SED fitting procedures. , keywords =. doi:10.48550/arXiv.astro-ph/0003380 , archivePrefix =. astro-ph/0003380 , primaryClass =
work page internal anchor Pith review doi:10.48550/arxiv.astro-ph/0003380
-
[29]
Bayesian Photometric Redshift Estimation. , keywords =. doi:10.1086/308947 , archivePrefix =. astro-ph/9811189 , primaryClass =
-
[30]
Reconstructing Galaxy Spectral Energy Distributions from Broadband Photometry. , keywords =. doi:10.1086/301159 , archivePrefix =. astro-ph/9910389 , primaryClass =
-
[31]
Slicing Through Multicolor Space: Galaxy Redshifts from Broadband Photometry. , keywords =. doi:10.1086/117720 , archivePrefix =. astro-ph/9508100 , primaryClass =
-
[32]
Optical multicolors : a poor person's Z machine for galaxies. , keywords =. doi:10.1086/113748 , adsurl =
-
[33]
Quasar photometric redshifts from incomplete data using deep learning. , keywords =. doi:10.1093/mnras/stac660 , archivePrefix =. 2203.03679 , primaryClass =
-
[34]
Photometric redshift estimation for CSST survey with LSTM neural networks. , keywords =. doi:10.1093/mnras/stae2446 , archivePrefix =. 2410.19402 , primaryClass =
-
[35]
ANNz2: Estimating photometric redshift and probability density functions using machine learning methods
-
[36]
Photometric redshift-aided classification using ensemble learning. , keywords =. doi:10.1051/0004-6361/202243135 , archivePrefix =. 2204.02080 , primaryClass =
-
[37]
Problems of Extra-Galactic Research , year = 1962, editor =
Photoelectric Magnitudes and Red-Shifts. Problems of Extra-Galactic Research , year = 1962, editor =
1962
-
[38]
and others , title =
Saxena, A. and others , title =. arXiv e-prints , year =
-
[39]
The Sloan Digital Sky Survey Photometric System. , keywords =. doi:10.1086/117915 , adsurl =
-
[40]
Maps of Dust IR Emission for Use in Estimation of Reddening and CMBR Foregrounds
Maps of Dust Infrared Emission for Use in Estimation of Reddening and Cosmic Microwave Background Radiation Foregrounds. , keywords =. doi:10.1086/305772 , archivePrefix =. astro-ph/9710327 , primaryClass =
work page internal anchor Pith review Pith/arXiv arXiv doi:10.1086/305772
-
[41]
Active Galactic Nuclei in the Sloan Digital Sky Survey. I. Sample Selection. , keywords =. doi:10.1086/428485 , archivePrefix =. astro-ph/0501059 , primaryClass =
-
[42]
Spectroscopic Target Selection in the Sloan Digital Sky Survey: The Main Galaxy Sample. , keywords =. doi:10.1086/342343 , archivePrefix =. astro-ph/0206225 , primaryClass =
-
[43]
Spectral Classification and Redshift Measurement for the SDSS-III Baryon Oscillation Spectroscopic Survey. , keywords =. doi:10.1088/0004-6256/144/5/144 , archivePrefix =. 1207.7326 , primaryClass =
-
[44]
LSST: From Science Drivers to Reference Design and Anticipated Data Products. , keywords =. doi:10.3847/1538-4357/ab042c , archivePrefix =. 0805.2366 , primaryClass =
-
[45]
LSST Science Book, Version 2.0
LSST Science Book, Version 2.0. arXiv e-prints , keywords =. doi:10.48550/arXiv.0912.0201 , archivePrefix =. 0912.0201 , primaryClass =
-
[46]
2011, ApJS, 193, 29, doi: 10.1088/0067-0049/193/2/29
The Eighth Data Release of the Sloan Digital Sky Survey: First Data from SDSS-III. , keywords =. doi:10.1088/0067-0049/193/2/29 , archivePrefix =. 1101.1559 , primaryClass =
-
[47]
G., Adelman, J., Anderson, Jr., J
The Sloan Digital Sky Survey: Technical Summary. , keywords =. doi:10.1086/301513 , archivePrefix =. astro-ph/0006396 , primaryClass =
-
[48]
Peebles,Principles of Physical Cosmology(1993), 10.1515/9780691206721
Principles of Physical Cosmology. doi:10.1515/9780691206721 , adsurl =
-
[49]
Princeton University Pres , doi =
Principles of Physical Cosmology. Princeton University Pres , doi =
-
[50]
, year = 1931, month = mar, volume =
A homogeneous universe of constant mass and increasing radius accounting for the radial velocity of extra-galactic nebulae. , year = 1931, month = mar, volume =. doi:10.1093/mnras/91.5.483 , adsurl =
-
[51]
Proceedings of the National Academy of Science , year = 1929, month = mar, volume =
A Relation between Distance and Radial Velocity among Extra-Galactic Nebulae. Proceedings of the National Academy of Science , year = 1929, month = mar, volume =. doi:10.1073/pnas.15.3.168 , adsurl =
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.