Machine-learning clustering of close-in exoplanet populations: links to pebble accretion
Pith reviewed 2026-06-27 08:35 UTC · model grok-4.3
The pith
Clustering reveals early formation for massive close-in giants
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The two-stage GMM clustering in a feature space of dynamical descriptors identifies sub-populations of close-in exoplanets. When mapped to pebble-accretion synthetic populations in a three-dimensional parameter space, these reveal differences in formation timing, gas accretion, and solid growth histories, with very-massive gas giants preferentially linked to earlier formation epochs than the hot-giant and warm-Jupiter populations.
What carries the argument
Two-stage Gaussian mixture model for probabilistic clustering of dynamical planet-star interaction descriptors, mapped onto pebble-accretion formation models via gas availability, gas fraction, and ice-rock mass ratio.
If this is right
- Very-massive gas giants form at earlier epochs than hot giants and warm Jupiters.
- Clusters show distinct gas accretion and solid growth histories.
- The mapping provides a statistical link between observed populations and theoretical formation pathways.
- Sub-populations emerge without imposed classification boundaries.
Where Pith is reading between the lines
- This clustering could be extended to include more exoplanet properties to test robustness against migration models.
- Independent age measurements of exoplanet systems might confirm the inferred formation epoch differences.
- The framework suggests that formation timing is a key driver of observed close-in architectures.
Load-bearing premise
The clusters from dynamical parameters reflect distinct pebble-accretion pathways rather than being primarily influenced by observational biases or post-formation migration.
What would settle it
If incorporating migration into the synthetic population models causes the mapped clusters to lose their distinct formation timing signatures, or if direct age determinations contradict the earlier formation for very-massive giants.
Figures
read the original abstract
Close-in exoplanets exhibit a wide range of orbital architectures and physical properties shaped by both formation conditions and migration processes. Although population-synthesis models predict distinct planetary populations, establishing a quantitative connection between observed exoplanets and synthetic populations remains challenging. We investigate the intrinsic organisation of close-in exoplanets using physically motivated dynamical parameters and connect the resulting populations to pebble-accretion formation pathways. A two-stage Gaussian mixture model (GMM) is applied to an observed sample of close-in exoplanets, performing unsupervised probabilistic clustering in a feature space dominated by dynamical descriptors of planet-star interactions. The resulting clusters are mapped onto a pebble-accretion synthetic population within a statistically motivated three-dimensional parameter space. Formation-related quantities, including gas availability, gas fraction, and ice-rock mass ratio, are then used to interpret the mapped populations. We identify statistically supported sub-populations without imposing predefined classification boundaries, including very-massive gas giants, hot giants, warm-Jupiter-dominated systems, and lower-mass giants. The mapped synthetic populations reveal systematic differences in formation timing, gas accretion, and solid growth histories. In particular, very-massive gas giants are preferentially associated with earlier formation epochs than hot-giant and warm-Jupiter-dominated populations. These results demonstrate that physically motivated machine-learning approaches can provide a statistically robust framework for linking observed exoplanet populations to theoretical planet formation pathways.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript applies a two-stage Gaussian mixture model (GMM) to perform unsupervised clustering of close-in exoplanets in a feature space dominated by dynamical descriptors of planet-star interactions. The resulting clusters are mapped onto a pebble-accretion synthetic population in a three-dimensional parameter space (gas availability, gas fraction, ice-rock mass ratio) to interpret formation-related quantities and identify sub-populations (very-massive gas giants, hot giants, warm-Jupiter-dominated systems, lower-mass giants). The mapped populations are reported to show systematic differences in formation timing, gas accretion, and solid growth histories, with very-massive gas giants preferentially linked to earlier formation epochs than other populations.
Significance. If the cluster-to-synthetic mapping can be shown to isolate formation pathways independently of post-formation migration and detection biases, the work would supply a statistically motivated framework for connecting observed dynamical architectures to pebble-accretion theory. The approach is novel in its use of physically motivated dynamical features for unsupervised clustering, but its significance hinges on whether the reported timing differences survive explicit forward-modeling of Type-II migration and radial-velocity/transit selection effects.
major comments (2)
- The central mapping of GMM clusters onto the pebble-accretion synthetic population in the 3D space of gas availability, gas fraction, and ice-rock mass ratio is not demonstrated to be robust against Type-II migration and disk-driven evolution, which are known to modify the dynamical descriptors used for clustering. Without an explicit forward-modeling test of these post-formation effects on the observed sample, the reported systematic differences in formation timing (very-massive gas giants tied to earlier epochs) cannot be isolated from the alternative that clusters primarily reflect survival and observability filters.
- No sample size, cluster-validation metrics (e.g., BIC, silhouette scores, or stability under bootstrap), or error analysis on the two-stage GMM is supplied, even though the abstract asserts that the sub-populations are statistically supported. This information is load-bearing for the claim that the clusters reveal distinct formation pathways rather than being shaped by the feature-space construction or normalization choices.
Simulated Author's Rebuttal
We thank the referee for the constructive and detailed report. The comments highlight important limitations in our current analysis, and we address each point below with plans for revision.
read point-by-point responses
-
Referee: [—] The central mapping of GMM clusters onto the pebble-accretion synthetic population in the 3D space of gas availability, gas fraction, and ice-rock mass ratio is not demonstrated to be robust against Type-II migration and disk-driven evolution, which are known to modify the dynamical descriptors used for clustering. Without an explicit forward-modeling test of these post-formation effects on the observed sample, the reported systematic differences in formation timing (very-massive gas giants tied to earlier epochs) cannot be isolated from the alternative that clusters primarily reflect survival and observability filters.
Authors: We agree this is a substantive limitation. The clustering relies on observed dynamical features that can be altered by Type-II migration, and our mapping to the synthetic population does not include explicit forward modeling to isolate formation timing from post-formation effects. In revision we will add a dedicated limitations subsection discussing how migration and selection biases could influence the dynamical descriptors and the resulting formation-epoch interpretations. We will temper the claims to emphasize that the reported differences are suggestive links rather than fully isolated formation pathways, and note that a full forward-modeling test lies beyond the present scope. revision: yes
-
Referee: [—] No sample size, cluster-validation metrics (e.g., BIC, silhouette scores, or stability under bootstrap), or error analysis on the two-stage GMM is supplied, even though the abstract asserts that the sub-populations are statistically supported. This information is load-bearing for the claim that the clusters reveal distinct formation pathways rather than being shaped by the feature-space construction or normalization choices.
Authors: We accept that these quantitative details are required to support the statistical claims. The revised manuscript will report the exact sample size of the close-in exoplanet catalog, the BIC and silhouette scores used to select the number of GMM components in each stage, bootstrap resampling results for cluster stability, and uncertainty estimates on cluster assignments and mapped parameters. These will be presented in a new methods subsection on clustering validation. revision: yes
Circularity Check
No circularity: unsupervised clustering on observed dynamics mapped to independent synthetics
full rationale
The derivation applies a two-stage GMM to observed close-in exoplanets using dynamical descriptors of planet-star interactions, then maps the resulting clusters onto a separate pebble-accretion synthetic population in a 3D parameter space of gas availability, gas fraction and ice-rock mass ratio. No quoted step defines the clusters or the mapping in terms of the formation-timing outcomes being interpreted, nor does any prediction reduce by construction to a fitted parameter or self-citation chain. The chain from data-driven clustering to interpretation of synthetic populations remains independent of the target claims.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
New metric to quantify the similarity between planetary systems: application to dimensionality reduction using T-SNE. , keywords =. doi:10.1051/0004-6361/201834592 , archivePrefix =. 1901.09719 , primaryClass =
work page internal anchor Pith review Pith/arXiv arXiv doi:10.1051/0004-6361/201834592 1901
-
[2]
Using Deep Neural Networks to compute the mass of forming planets
Using deep neural networks to compute the mass of forming planets. , keywords =. doi:10.1051/0004-6361/201834942 , archivePrefix =. 1903.00320 , primaryClass =
work page internal anchor Pith review Pith/arXiv arXiv doi:10.1051/0004-6361/201834942 1903
-
[3]
A transformer-based generative model for planetary systems. , keywords =. doi:10.1051/0004-6361/202452297 , archivePrefix =. 2509.07226 , primaryClass =
-
[4]
K2 variable catalogue – II. Machine learning classification of variable stars and eclipsing binaries in K2 fields 0–4 , journal =. 2015 , month =. doi:10.1093/mnras/stv2836 , url =
-
[5]
Exoplanet validation with machine learning: 50 new validated Kepler planets , journal =. doi:10.1093/mnras/staa2498 , archivePrefix =. 2008.10516 , primaryClass =
-
[6]
The growth of planets by pebble accretion in evolving protoplanetary discs
The growth of planets by pebble accretion in evolving protoplanetary discs. , keywords =. doi:10.1051/0004-6361/201526463 , archivePrefix =. 1507.05209 , primaryClass =
work page internal anchor Pith review Pith/arXiv arXiv doi:10.1051/0004-6361/201526463
-
[7]
Pebble-isolation mass: Scaling law and implications for the formation of super-Earths and gas giants , journal =. doi:10.1051/0004-6361/201731931 , archivePrefix =. 1801.02341 , primaryClass =
work page internal anchor Pith review Pith/arXiv arXiv doi:10.1051/0004-6361/201731931
-
[8]
2023 , volume =
Hands-On Equations Balance Model Enhances Algebraic Equation Solving in Upper Elementary and Middle School Students , journal =. 2023 , volume =
2023
-
[9]
NGTS-26 b and NGTS-27 b , journal =
NGTS discovery of a highly inflated Saturn-mass planet and a highly irradiated hot Jupiter. NGTS-26 b and NGTS-27 b , journal =. doi:10.1051/0004-6361/202347162 , adsurl =
-
[10]
Exoplanet formation inference using conditional invertible neural networks. arXiv e-prints , keywords =. doi:10.48550/arXiv.2512.05751 , archivePrefix =. 2512.05751 , primaryClass =
-
[11]
Inside-out Planet Formation. , keywords =. doi:10.1088/0004-637X/780/1/53 , archivePrefix =. 1306.0576 , primaryClass =
work page internal anchor Pith review Pith/arXiv arXiv doi:10.1088/0004-637x/780/1/53
-
[12]
Vulcan Planets: Inside-Out Formation of the Innermost Super-Earths
Vulcan Planets: Inside-out Formation of the Innermost Super-Earths. , keywords =. doi:10.1088/2041-8205/798/2/L32 , archivePrefix =. 1411.2629 , primaryClass =
work page internal anchor Pith review Pith/arXiv arXiv doi:10.1088/2041-8205/798/2/l32 2041
-
[13]
2023 , doi =
The evolution of hot Jupiters revealed by the age distribution of their host stars , journal =. 2023 , doi =
2023
-
[14]
Origins of Hot Jupiters , journal =. doi:10.1146/annurev-astro-081817-051853 , archivePrefix =. 1801.06117 , primaryClass =
work page internal anchor Pith review Pith/arXiv arXiv doi:10.1146/annurev-astro-081817-051853
-
[15]
The New Generation Planetary Population Synthesis (NGPPS). I. Bern global model of planet formation and evolution, model tests, and emerging planetary systems , journal =. doi:10.1051/0004-6361/202038553 , archivePrefix =. 2007.05561 , primaryClass =
-
[16]
Feinstein, C and Baume, G and Vergne, M M and Rodríguez, M J , title =. , volume =. 2024 , month =. doi:10.1093/mnras/stae2028 , url =
-
[17]
Architectures of Planetary Systems and Implications for their Formation
Architectures of planetary systems and implications for their formation. PNAS , keywords =. doi:10.1073/pnas.1304219111 , archivePrefix =. 1404.3157 , primaryClass =
work page internal anchor Pith review Pith/arXiv arXiv doi:10.1073/pnas.1304219111
-
[18]
A review of unsupervised learning in astronomy , journal =. 2024 , issn =. doi:https://doi.org/10.1016/j.ascom.2024.100851 , url =
-
[19]
The California-Kepler Survey. III. A Gap in the Radius Distribution of Small Planets. , keywords =. doi:10.3847/1538-3881/aa80eb , archivePrefix =. 1703.10375 , primaryClass =
work page internal anchor Pith review Pith/arXiv arXiv doi:10.3847/1538-3881/aa80eb
-
[20]
The dynamics of planetary rings. , keywords =. doi:10.1146/annurev.aa.20.090182.001341 , adsurl =
-
[21]
Giant Planets from the Inside-Out. Protostars and Planets VII , year = 2023, editor =. doi:10.48550/arXiv.2205.04100 , archivePrefix =. 2205.04100 , primaryClass =
-
[22]
A hot-Jupiter progenitor on a super-eccentric retrograde orbit. , year = 2024, month = aug, volume =. doi:10.1038/s41586-024-07688-3 , adsurl =
-
[23]
Optimizing exoplanet atmosphere retrieval using unsupervised machine-learning classification , journal =. 2020 , month =. doi:10.1093/mnras/staa978 , url =
-
[24]
doi:10.3847/1538-3881/ae1ba3 , adsurl =
Detection of Exoplanets with Machine Learning Techniques Through Transit Light-curve Analysis , journal =. doi:10.3847/1538-3881/ae1ba3 , adsurl =
-
[25]
Inside-out Planet Formation. III. Planet-Disk Interaction at the Dead Zone Inner Boundary. , keywords =. doi:10.3847/0004-637X/816/1/19 , archivePrefix =. 1508.02791 , primaryClass =
work page internal anchor Pith review Pith/arXiv arXiv doi:10.3847/0004-637x/816/1/19
-
[26]
Hot super-Earth systems from breaking compact resonant chains
Formation of planetary systems by pebble accretion and migration. Hot super-Earth systems from breaking compact resonant chains. , keywords =. doi:10.1051/0004-6361/201935336 , archivePrefix =. 1902.08772 , primaryClass =
-
[27]
MNRAS , volume =
Jiang, Ing-Guey and Yeh, Li-Chin and Hung, Wen-Liang and Yang, Miin-Shen , title =. MNRAS , volume =. 2006 , month =
2006
-
[28]
Forming Planets via Pebble Accretion. Annu. Rev. Earth Planet. Sci. , year = 2017, month = aug, volume =. doi:10.1146/annurev-earth-063016-020226 , adsurl =
-
[29]
How planetary growth outperforms migration
How planetary growth outperforms migration , journal =. doi:10.1051/0004-6361/201834071 , archivePrefix =. 1811.00523 , primaryClass =
work page internal anchor Pith review Pith/arXiv arXiv doi:10.1051/0004-6361/201834071
-
[30]
A pebble accretion model for the formation of the terrestrial planets in the Solar System. Science Advances , keywords =. doi:10.1126/sciadv.abc0444 , archivePrefix =. 2102.08611 , primaryClass =
-
[31]
, keywords =
Exploring the conditions for forming planetesimals by the streaming instability and planetary systems by pebble accretion. , keywords =
-
[32]
Machine Learning for Exoplanet Detection: A Comparative Analysis using Kepler Data , journal =. 2025 , publisher =. doi:10.22128/ijaa.2025.2996.1219 , keywords =
-
[33]
doi:10.33232/001c.124538 , archivePrefix =
Estimating Exoplanet Mass using Machine Learning on Incomplete Datasets , journal =. doi:10.33232/001c.124538 , archivePrefix =. 2410.06922 , primaryClass =
-
[34]
Forming the cores of giant planets from the radial pebble flux in protoplanetary discs
Forming the cores of giant planets from the radial pebble flux in protoplanetary discs. , keywords =. doi:10.1051/0004-6361/201424343 , archivePrefix =. 1408.6094 , primaryClass =
work page internal anchor Pith review Pith/arXiv arXiv doi:10.1051/0004-6361/201424343
-
[35]
Mamajek, Eric E. , title =. AIP Conf. Proc. , volume =. 2009 , month =. doi:10.1063/1.3215910 , url =
-
[36]
, year = 1995, month = nov, volume =
A Jupiter-mass companion to a solar-type star. , year = 1995, month = nov, volume =. doi:10.1038/378355a0 , adsurl =
-
[37]
Characterization of exoplanets from their formation. I. Models of combined planet formation and evolution , journal =. doi:10.1051/0004-6361/201118457 , archivePrefix =. 1206.6103 , primaryClass =
work page internal anchor Pith review Pith/arXiv arXiv doi:10.1051/0004-6361/201118457
-
[38]
Revisiting mass–radius relationships for exoplanet populations: a machine learning insight , journal =. 2023 , month =. doi:10.1093/mnras/stad2506 , url =
-
[39]
doi:10.1051/0004-6361/202348690 , archivePrefix =
The mass-radius relation of exoplanets revisited , journal =. doi:10.1051/0004-6361/202348690 , archivePrefix =. 2311.12593 , primaryClass =
-
[40]
Planet population synthesis driven by pebble accretion in cluster environments
Planet population synthesis driven by pebble accretion in cluster environments. , keywords =. doi:10.1093/mnras/stx2815 , archivePrefix =. 1710.10863 , primaryClass =
work page internal anchor Pith review Pith/arXiv arXiv doi:10.1093/mnras/stx2815
-
[41]
Turkish Physical Society 33rd International Physics Congress (TPS33) , year = 2018, series =
On the classification of exoplanets according to Safronov number. Turkish Physical Society 33rd International Physics Congress (TPS33) , year = 2018, series =. doi:10.1063/1.5025961 , adsurl =
-
[42]
Stable lifetime of compact, evenly spaced planetary systems with non-equal masses , journal =. 2023 , month =. doi:10.1093/mnras/stad393 , url =
-
[43]
The Protoplanetary Cloud and Its Evolution , journal =
-
[44]
Machine learning-based identification of Gaia astrometric exoplanet orbits. , keywords =. doi:10.1093/mnras/staf018 , archivePrefix =. 2404.09350 , primaryClass =
-
[45]
Postulating exoplanetary habitability via a novel anomaly detection method , journal =. 2021 , month =. doi:10.1093/mnras/stab3556 , url =
-
[46]
Schanche, N and Cameron, A Collier and Hébrard, G and Nielsen, L and Triaud, A H M J and Almenara, J M and Alsubai, K A and Anderson, D R and Armstrong, D J and Barros, S C C and Bouchy, F and Boumis, P and Brown, D J A and Faedi, F and Hay, K and Hebb, L and Kiefer, F and Mancini, L and Maxted, P F L and Palle, E and Pollacco, D L and Queloz, D and Small...
-
[47]
The New Generation Planetary Population Synthesis (NGPPS). V. Predetermination of planet types in global core accretion models , journal =. doi:10.1051/0004-6361/202140551 , archivePrefix =. 2104.11750 , primaryClass =
-
[48]
Galaxy morphological classification with manifold learning , journal =. 2025 , issn =. doi:https://doi.org/10.1016/j.ascom.2025.100963 , url =
-
[49]
2019 , url =
Machine Learning Pipeline for Exoplanet Classification , journal =. 2019 , url =
2019
-
[50]
doi:10.1073/pnas.2001258117 , archivePrefix =
Predicting the long-term stability of compact multiplanet systems , journal =. doi:10.1073/pnas.2001258117 , archivePrefix =. 2007.06521 , primaryClass =
-
[51]
An Overview of Inside-Out Planet Formation
An Overview of Inside-Out Planet Formation. IAU Focus Meeting , keywords =. doi:10.1017/S1743921316002313 , archivePrefix =. 1510.06703 , primaryClass =
work page internal anchor Pith review Pith/arXiv arXiv doi:10.1017/s1743921316002313
-
[52]
2014 , volume =
The Need to Report Effect Size Estimates Revisited: An Overview of Some Recommended Measures of Effect Size , journal =. 2014 , volume =
2014
-
[53]
Viualizing data using t-SNE , volume =
van der Maaten, Laurens and Hinton, Geoffrey and Rachmad, Yoesoep , year =. Viualizing data using t-SNE , volume =
-
[54]
The Occurrence and Architecture of Exoplanetary Systems
The Occurrence and Architecture of Exoplanetary Systems. , keywords =. doi:10.1146/annurev-astro-082214-122246 , archivePrefix =. 1410.4199 , primaryClass =
work page internal anchor Pith review Pith/arXiv arXiv doi:10.1146/annurev-astro-082214-122246
-
[55]
doi:10.1038/s41586-020-2800-0 , archivePrefix =
Stellar clustering shapes the architecture of planetary systems , journal =. doi:10.1038/s41586-020-2800-0 , archivePrefix =. 2010.10531 , primaryClass =
-
[56]
Dynamical instability and its implications for planetary system architecture , journal =. 2019 , month =. doi:10.1093/mnras/stz054 , url =
-
[57]
Learning a Mahalanobis distance metric for data clustering and classification , journal =. 2008 , issn =. doi:https://doi.org/10.1016/j.patcog.2008.05.018 , author =
-
[58]
doi:10.3847/1538-4357/adddb3 , archivePrefix =
The Narrow Formation Pathway of Hot Saturns: Constraints on Initial Planetary Properties , journal =. doi:10.3847/1538-4357/adddb3 , archivePrefix =. 2505.23148 , primaryClass =
-
[59]
Improving Earth-like planet detection in radial velocity using deep learning. , keywords =. doi:10.1051/0004-6361/202450022 , archivePrefix =. 2405.13247 , primaryClass =
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.