pith. machine review for the scientific record. sign in

arxiv: 2604.19690 · v1 · submitted 2026-04-21 · 🌌 astro-ph.SR · astro-ph.IM

Recognition: unknown

Is the `Known' Enough? An Integrated Machine Learning Framework for Eclipsing Binary Classification and Parameter Estimation Based on Well-Characterized Systems

Authors on Pith no claims yet

Pith reviewed 2026-05-10 01:20 UTC · model grok-4.3

classification 🌌 astro-ph.SR astro-ph.IM
keywords eclipsing binariesmachine learninglight curve classificationparameter estimationphotometric surveysXGBoostRandom Forestmorphology classification
0
0 comments X

The pith

Machine learning ensembles trained on 845 well-characterized eclipsing binaries classify morphologies at 95% accuracy and estimate physical parameters like temperature ratios and inclinations with R-squared values from 0.77 to 0.92 on held-

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper develops a multi-task machine learning framework that performs simultaneous morphology classification and physical parameter estimation for eclipsing binaries directly from their photometric light curves. Models are trained on 51 domain-specific features extracted from phase-folded light curves of 845 well-characterized systems, with 15% held out for independent testing. Performance reaches 95.4% classification accuracy in cross-validation and strong regression scores on the test set, with further checks against OGLE and Kepler catalogs confirming that the outputs generalize when known biases are considered. This matters because large-scale surveys generate far more light curves than can be modeled individually with traditional methods, so an automated pipeline could scale detailed characterization across thousands of new systems.

Core claim

Random Forest and XGBoost ensemble models, trained on 51 features from phase-folded light curves of 845 well-characterized eclipsing binaries across three morphological classes, simultaneously classify system morphology with 95.4% cross-validation accuracy and 90.7% accuracy on a held-out test set while estimating effective temperature ratio (R² = 0.88), primary and secondary surface potentials (R² = 0.91 and 0.92), inclination (R² = 0.89), and mass ratio (R² = 0.77). Physics-guided post-processing is applied after the models, and independent validation against the OGLE OCVS catalog yields 0.99 contact recall across 104692 matched systems, while cross-matches with Kepler catalogs recover 77%

What carries the argument

The multi-task ensemble framework that extracts 51 domain-specific features from each phase-folded photometric light curve and feeds them to Random Forest and XGBoost models for joint classification and regression, followed by physics-guided post-processing.

If this is right

  • Thousands of high-confidence eclipsing binary candidates can be identified and assigned preliminary physical parameters from existing photometric survey data.
  • Morphological classifications achieve high recall (0.99 for contact systems) when cross-checked against the OGLE Online Catalog of Variable Stars across over 100,000 matched objects.
  • Cross-matching with independent Kepler catalogs confirms 77% classification accuracy and parameter recovery consistent with known selection biases, third-light dilution, and differences between photometric and spectroscopic methods.
  • The framework provides a scalable route to detailed astrophysical characterization for the volume of light curves produced by modern photometric surveys.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • If applied to upcoming large-scale surveys, the same pipeline could supply initial parameter estimates for millions of systems and thereby prioritize targets for follow-up spectroscopy or multi-band photometry.
  • The feature set derived from phase-folded curves may transfer to other classes of periodic variables once suitable training labels become available.
  • Accounting explicitly for third-light contamination in the post-processing step could reduce the systematic deviations observed in the Kepler validation.
  • The approach could be extended to joint modeling with radial-velocity data when such measurements exist for a subset of targets, tightening the mass-ratio and temperature estimates.

Load-bearing premise

The 845 well-characterized systems used for training are representative of the broader population of eclipsing binaries encountered in large surveys, and the 51 extracted features capture the necessary information without significant information loss or selection bias from phase-folding and feature engineering.

What would settle it

A comparison of the model's predicted parameters against independent spectroscopic or detailed photometric solutions for several hundred eclipsing binaries not included in the original 845-system training set would reveal whether the reported R-squared values hold outside the training distribution.

Figures

Figures reproduced from arXiv: 2604.19690 by Burak Ula\c{s}.

Figure 1
Figure 1. Figure 1: Resolution sensitivity of the 51 extracted features. Each panel shows results for subsampling at 100, 250, 500, and 750 phase points (the baseline is 1000 points). Feature deviation is measured in units of the training-set standard deviation (σ) of each feature. The horizontal dashed line marks the 10% threshold (0.10σ) used throughout the analysis. (a) Solid line: overall mean deviation across all 51 feat… view at source ↗
Figure 2
Figure 2. Figure 2: Parameter space of the 995 training systems (NDetached = 421, NSemidetached = 234, NContact = 340) shown in three projection planes: inclination vs. mass ratio (i, q; left), mass ratio vs. temperature ratio (q, Te2 /Te1 ; middle), and primary vs. secondary surface potential (Ω1, Ω2; right). Class separability was quantified through two-sample Kolmogorov–Smirnov (KS) tests for each of the five predicted par… view at source ↗
Figure 3
Figure 3. Figure 3: Pearson correlation matrix of the five predicted parameters for the 995 training systems. consistency across all experiments. Cross-validation splits ensured strictly non-overlapping training and validation indices across all five folds. 2.2. Feature Engineering We extract 51 domain-specific features from each phase-folded light curve, organized into four categories designed to capture physically meaningfu… view at source ↗
Figure 4
Figure 4. Figure 4: Top panel: RF hyperparameter sensitivity analysis on the cross-validation set. The plot show CV R 2 variation with the number of estimators (a), the maximum tree depth (b), and the maximum features per split (c). For panel (c), the numbers on the x-axis correspond to log2 (51) ≈ 6, √ 51 ≈ 7, 0.3 × 51 ≈ 15, and 0.5 × 51 ≈ 26 features considered at each split out of the 51 total features. Bottom panel: XGB h… view at source ↗
Figure 5
Figure 5. Figure 5: Importance of phase-binned statistics for RF (left) and XGB (right) models. Regions covered by vertical dashed lines mark secondary and primary eclipse phases. max features=’sqrt’ ( √ 51 ≈ 7 features per split) introduces randomness that improves ensemble diversity and re￾duces overfitting. A fixed random seed (random state=42) ensures reproducibility. XGB regression models employ 500 trees with moderate d… view at source ↗
Figure 6
Figure 6. Figure 6: Comparison of three XGB-based regression architectures evaluated on the set of 5-fold stratified CV, 845 systems). Blue bars correspond to five independent single-output XGBRegressor models (mean R 2 = 0.867), a MultiOutputRegressor wrapper that fits one independent XGBRegressor per target represented by orange bars ( mean R 2 = 0.867). RegressorChain of XGBRegressor models (green bars, mean R 2 = 0.863), … view at source ↗
Figure 7
Figure 7. Figure 7: Learning curves for XGB regression across five stellar parameters. Validation R 2 is shown as a function of training set size (10%–80% of the 845-system cross-validation pool. Each point represents the mean over five independent random subsamples; shaded bands indicate ±1 standard deviation. The held-out set was excluded throughout.      % " "  !  !  $#  [PITH_FULL_IMAGE:figures/full_fig… view at source ↗
Figure 8
Figure 8. Figure 8: PCA dimensionality reduction experiment. Left: cumulativeexplained variance as a function of the number of principal components, with dashed lines marking the 90% (9 components), 95% (12 components), and 99% (20 components) thresholds. Right: mean 5-fold CV R 2 on the 845-system CV set as a function of the number of principal components for each parameter. overfitting. The learning curves ( [PITH_FULL_IMA… view at source ↗
Figure 9
Figure 9. Figure 9: Predicted versus true values for XGB regression on the held-out test set. Diagonal black lines indicate perfect agreement while the purple dashed lines mark ±10% relative error bounds. spread (˜σi = 4.94◦ ) than detached systems (˜σi = 3.03◦ ), consistent with the smooth, continuously varying light curves of contact configurations, which lack the sharp eclipse features needed to tightly constrain inclinati… view at source ↗
Figure 10
Figure 10. Figure 10: XGB residual distributions (predicted minus true) for the held-out test set. Histograms show residual density per parameter; red curves show fitted Gaussian distributions. Panel headers report mean (µ) and standard deviation (σ) of each distribution [PITH_FULL_IMAGE:figures/full_fig_p014_10.png] view at source ↗
Figure 11
Figure 11. Figure 11: Representative RF tree-prediction distributions for all five predicted parameters. For each cross-validation sample, predictions from all 500 individual decision trees are collected out-of-fold and a Gaussian Mixture Model (GMM with 1 or 2 components, selected by BIC) is fitted to the resulting histogram. Each panel shows a single representative system; the morphol￾ogy label in brackets (D,SD,C) indicates… view at source ↗
Figure 12
Figure 12. Figure 12: Confusion matrices for XGB morphology classification. The left panel shows results on the 5-fold cross-validation set (N = 845, accuracy = 95.4%), while the right panel shows results on the held-out test set (accuracy = 90.7%). Each cell gives the number of systems and the row percentage. D, SD and C denote detached, semidetached and contact. A post-hoc analysis of the misclassified semidetached systems i… view at source ↗
Figure 13
Figure 13. Figure 13: Feature distributions for correctly classified (green) and misclassified (red) semidetached systems. Left: O’Connell effect (flux difference between maxima). Right: primary eclipse width (phase units). The y-axis shows normalised probability density. A sensitivity analysis was conducted on a subset of the random validation data (N=200) to assess potential systematic bias introduced by switching from the a… view at source ↗
Figure 14
Figure 14. Figure 14: Parameter distributions for OGLE predictions. The histograms show the distribution of inclination (i), mass ratio (q), and effective temperature ratio (Te2 /Te1 ) separated by morphological class (blue: detached, green: semidetached, red: contact).  [PITH_FULL_IMAGE:figures/full_fig_p019_14.png] view at source ↗
Figure 15
Figure 15. Figure 15: Representative OGLE light curves for detached (top), semidetached (middle), and and contact (bottom) systems selected by Mahalanobis distance filtering (DM ≤ median) and highest APC scores. Black points are processed light curves; grey dots are the raw catalog data. Vertical dashed lines mark primary and secondary eclipse phases. Panel legends show the OGLE-BLG-ELC identifier, APC score, Mahalanobis dista… view at source ↗
Figure 16
Figure 16. Figure 16: Same as [PITH_FULL_IMAGE:figures/full_fig_p020_16.png] view at source ↗
Figure 17
Figure 17. Figure 17: Same as [PITH_FULL_IMAGE:figures/full_fig_p021_17.png] view at source ↗
Figure 18
Figure 18. Figure 18: Parameter-space coverage analysis in PCA space (PC1 + PC2 account for 64.3% of variance). The filled blue density shows the KDE of 10000 randomly sampled OGLE survey targets; the red dashed contours represent the KDE of the 995 training systems. The two-dimensional KDE overlap coefficient is OVL = 0.47. We also note a limitation of the APC confidence metric as currently defined. Because APC measures the c… view at source ↗
read the original abstract

This study presents a multi-task machine learning framework for simultaneous morphology classification and physical parameter estimation of eclipsing binaries using photometric light curves. We train Random Forest and XGBoost ensemble models on 845 of 995 well-characterized systems comprising three morphological configurations by extracting 51 domain-specific features from each phase-folded light. To assess generalization, 15% of systems were withheld as an independent test set before any model training. On this held-out set, the XGBoost model yields $R^2$ values of 0.88 for the effective temperature ratio, 0.91 for the primary surface potential, 0.92 for the secondary surface potential, 0.89 for inclination, and 0.77 for the mass ratio. Morphology classification achieves 95.4% accuracy on the cross-validation set with per-class F1 scores exceeding 0.90, while the held-out test set confirms generalization with 90.7% accuracy. We present a catalog of estimated physical parameters and classifications for these systems, identifying thousands of high-confidence candidates. Morphological classifications are independently validated against the OGLE Online Catalog of Variable Stars (OCVS), achieving a contact recall of 0.99 across 104692 matched systems. The model's generalization capability is validated by cross-matching predictions with independent Kepler catalogs, achieving 77% classification accuracy and recovering physical parameters with systematic deviations consistent with known selection biases, third-light dilution, and methodological differences between photometric and spectroscopic approaches. This work confirms that machine learning ensembles, when coupled with physics guided post-processing, can effectively bridge the gap between massive photometric surveys and detailed astrophysical characterization.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper presents a multi-task machine learning framework using Random Forest and XGBoost ensembles trained on 845 well-characterized eclipsing binary systems. It extracts 51 domain-specific features from phase-folded light curves to simultaneously classify morphology (detached, semi-detached, contact) and estimate physical parameters including effective temperature ratio, primary and secondary surface potentials, inclination, and mass ratio. Reported performance includes R² values of 0.77–0.92 on a 15% held-out test set, 95.4% morphology classification accuracy under cross-validation, and 90.7% accuracy on the held-out set. External validation shows high contact recall (0.99) against the OGLE OCVS catalog for 104692 matched systems and 77% classification accuracy with systematic parameter offsets when cross-matched to Kepler catalogs. The central claim is that ML ensembles combined with physics-guided post-processing can bridge the gap between massive photometric surveys and detailed astrophysical characterization.

Significance. If the generalization to survey-scale data holds, the framework could provide a scalable method for classifying and parameterizing large numbers of eclipsing binaries from surveys such as TESS or LSST, where full physical modeling is impractical. The use of an independent held-out test set, cross-validation, and external catalog cross-matches supplies concrete evidence supporting the supervised prediction task, though the extent to which the approach remains robust under realistic distribution shifts is central to its claimed utility.

major comments (2)
  1. [Abstract and Kepler validation section] Abstract and Kepler validation section: the cross-match with independent Kepler catalogs yields only 77% classification accuracy together with systematic offsets in recovered parameters, explicitly attributed to selection biases, third-light dilution, and photometric-versus-spectroscopic differences. Because the central claim requires the framework to bridge well-characterized systems to blind survey detections, this performance degradation under distribution shift is load-bearing and needs quantitative mitigation or additional tests to substantiate the generalization assertion.
  2. [Training data description and methodology] Training data description and methodology: the 845 training systems are selected as 'well-characterized,' which by construction biases the sample toward objects with prior high-quality follow-up and therefore differs in period, amplitude, SNR, and morphology from typical blind survey detections. The 15% held-out test set remains inside this distribution, and the 51 phase-folded features plus post-processing have not been shown to be invariant under the shift that defines the target application, as indicated by the Kepler results.
minor comments (2)
  1. [Abstract] Abstract: no details are provided on the hyperparameter tuning procedure, the feature selection process that produced the 51 domain-specific features, uncertainty estimates or error bars on the predicted parameters, or the explicit implementation of third-light and dilution corrections in the physics-guided post-processing step.
  2. [General presentation] General presentation: clarify how the physics-guided post-processing is applied to the raw ML outputs and whether it is deterministic or introduces additional free parameters, to support reproducibility.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive and detailed comments, which have prompted us to clarify the scope of our claims regarding generalization. We respond point by point to the major comments below.

read point-by-point responses
  1. Referee: [Abstract and Kepler validation section] Abstract and Kepler validation section: the cross-match with independent Kepler catalogs yields only 77% classification accuracy together with systematic offsets in recovered parameters, explicitly attributed to selection biases, third-light dilution, and photometric-versus-spectroscopic differences. Because the central claim requires the framework to bridge well-characterized systems to blind survey detections, this performance degradation under distribution shift is load-bearing and needs quantitative mitigation or additional tests to substantiate the generalization assertion.

    Authors: We agree that the Kepler cross-match constitutes a key test of generalization to survey-like data and that the drop to 77% accuracy with systematic parameter offsets is an important indicator of distribution shift. The manuscript already reports these results transparently and attributes them to the cited factors. In revision we will expand the Kepler validation section with a quantitative tabulation of per-parameter biases and scatters relative to the Kepler values, together with a brief discussion of how third-light corrections could be incorporated into the existing post-processing step. We note, however, that complete removal of the shift would require a training set drawn from blind survey detections with independent high-precision parameters, which does not yet exist at the necessary scale; the current external validations therefore illustrate both the promise and the realistic limits of the approach. revision: partial

  2. Referee: [Training data description and methodology] Training data description and methodology: the 845 training systems are selected as 'well-characterized,' which by construction biases the sample toward objects with prior high-quality follow-up and therefore differs in period, amplitude, SNR, and morphology from typical blind survey detections. The 15% held-out test set remains inside this distribution, and the 51 phase-folded features plus post-processing have not been shown to be invariant under the shift that defines the target application, as indicated by the Kepler results.

    Authors: We concur that the training sample is necessarily biased toward systems with existing high-quality characterization, and that the held-out test set therefore lies within the same distribution. This is an unavoidable consequence of supervised learning when reliable ground-truth parameters are required. The 51 features were selected on physical grounds to capture morphology and potential information from phase-folded light curves; their robustness is evidenced by the strong held-out and OGLE performance. We will revise the training-data and methodology sections to include an explicit discussion of the expected distribution shift, supported by a comparison of feature statistics between the training sample and the Kepler cross-matches, and will add a dedicated limitations subsection. Full invariance cannot be demonstrated without additional labeled data from blind surveys, but the proposed textual changes will make the current evidence and its boundaries clearer to readers. revision: partial

Circularity Check

0 steps flagged

Standard supervised ML training and external validation exhibit no circularity

full rationale

The paper implements a conventional supervised learning pipeline: 51 features are extracted from phase-folded light curves of 845 pre-characterized systems, models (Random Forest, XGBoost) are trained to map those features to morphology labels and physical parameters, and performance is measured on an explicitly withheld 15% test set plus cross-matches to independent catalogs (OGLE OCVS, Kepler). No derivation, equation, or uniqueness claim reduces the output to the input by construction; the held-out predictions and external validations are statistically independent of the training labels. Self-citations are absent from the load-bearing steps, and no ansatz or fitted parameter is relabeled as a first-principles result.

Axiom & Free-Parameter Ledger

2 free parameters · 2 axioms · 0 invented entities

The framework rests on the quality and representativeness of the training catalog plus standard supervised-learning assumptions; no new physical entities are postulated.

free parameters (2)
  • XGBoost and Random Forest hyperparameters
    Tuned on the training data; exact values and search procedure not stated in abstract.
  • Selection of the 51 domain-specific features
    Feature engineering choices that affect all downstream results.
axioms (2)
  • domain assumption Parameters of the 845 training systems constitute accurate ground truth.
    Required for supervised learning; any systematic errors in the training catalog propagate directly.
  • domain assumption Phase-folded light curves and the 51 extracted features retain sufficient information for both classification and regression.
    Core premise of the feature-based approach.

pith-pipeline@v0.9.0 · 5609 in / 1457 out tokens · 43863 ms · 2026-05-10T01:20:56.200134+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

28 extracted references · 23 canonical work pages

  1. [1]

    J., Kirk, J., Lam, K

    Armstrong, D. J., Kirk, J., Lam, K. W. F., et al. 2016, Monthly Notices of the Royal Astronomical Society, 456, 2260, doi: 10.1093/mnras/stv2836

  2. [2]

    Hargis, J. R. 2004, in American Astronomical Society Meeting Abstracts, Vol. 204, American Astronomical Society Meeting Abstracts #204, 05.01

  3. [3]

    Machine Learning , author =

    Breiman, L. 2001, Machine Learning, 45, 5, doi: 10.1023/A:1010933404324

  4. [4]

    Proceedings of the 22nd

    Chen, T., & Guestrin, C. 2016, in Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, ACM, 785–794, doi: 10.1145/2939672.2939785

  5. [5]

    , keywords =

    Chen, X., Wang, S., Deng, L., et al. 2020, ApJS, 249, 18, doi: 10.3847/1538-4365/ab9cae

  6. [6]

    , keywords =

    Claret, A., & Bloemen, S. 2011, A&A, 529, A75, doi: 10.1051/0004-6361/201116451

  7. [7]

    2024, AJ, 167, 192, doi: 10.3847/1538-3881/ad3048 D’Isanto, A., & Polsterer, K

    Ding, X., Song, Z., Wang, C., & Ji, K. 2024, AJ, 167, 192, doi: 10.3847/1538-3881/ad3048 D’Isanto, A., & Polsterer, K. L. 2018, A&A, 609, A111, doi: 10.1051/0004-6361/201731326

  8. [8]

    SIAM Journal on Numerical Analysis , author =

    Fritsch, F. N., & Carlson, R. E. 1980, SIAM Journal on Numerical Analysis, 17, 238, doi: 10.1137/0717021

  9. [9]

    F., & Nagel, T

    Gray, D. F., & Nagel, T. 1989, ApJ, 341, 421, doi: 10.1086/167505

  10. [10]

    2016, AJ, 151, 68, doi: 10.3847/0004-6256/151/3/68 18

    Kirk, B., Conroy, K., Prˇ sa, A., et al. 2016, The Astronomical Journal, 151, 68, doi: 10.3847/0004-6256/151/3/68 Latkovi´ c, O.,ˇCeki, A., & Lazarevi´ c, S. 2021, ApJS, 254, 10, doi: 10.3847/1538-4365/abeb23

  11. [11]

    2025, ApJS, 277, 51, doi: 10.3847/1538-4365/adba63

    Li, K., & Wang, L.-H. 2025, ApJS, 277, 51, doi: 10.3847/1538-4365/adba63

  12. [12]

    Lucy, L. B. 1967, Zeitschrift f¨ ur Astrophysik, 65, 89

  13. [13]

    Mahalanobis, P. C. 1936, Proceedings of the National Institute of Sciences of India, 2, 49

  14. [14]

    2023, A&A, 674, A16, doi: 10.1051/0004-6361/202245330

    Mowlavi, N., Holl, B., Lecoeur-Ta¨ ıbi, I., et al. 2023, A&A, 674, A16, doi: 10.1051/0004-6361/202245330

  15. [15]

    M., Gould, A., Fouqu´ e, P., et al

    Nataf, D. M., Gould, A., Fouqu´ e, P., et al. 2013, The Astrophysical Journal, 769, 88, doi: 10.1088/0004-637X/769/2/88 Parimucha, ˇS., Gabdeev, M., Markus, Y., Vaˇ nko, M., & Gajdoˇ s, P. 2025, Astronomy and Computing, 53, 100998, doi: 10.1016/j.ascom.2025.100998

  16. [16]

    2016, Acta Astronomica, 66, 421, doi: 10.48550/arXiv.1612.06394

    Pawlak, M., Soszy´ nski, I., Udalski, A., et al. 2016, Acta Astronomica, 66, 421, doi: 10.48550/arXiv.1612.06394

  17. [17]

    2011, Journal of Machine Learning Research, 12, 2825

    Pedregosa, F., Varoquaux, G., Gramfort, A., et al. 2011, Journal of Machine Learning Research, 12, 2825

  18. [18]

    2002 , month = dec, pages =

    Pojmanski, G. 2002, Acta Astronomica, 52, 397, doi: 10.48550/arXiv.astro-ph/0210283 Prˇ sa, A., & Zwitter, T. 2005, ApJ, 628, 426, doi: 10.1086/430591 Prˇ sa, A., Batalha, N., Slawson, R. W., et al. 2011, AJ, 141, 83, doi: 10.1088/0004-6256/141/3/83 Ram´ ırez, I., & Mel´ endez, J. 2005, ApJ, 626, 465, doi: 10.1086/430102 Ruci´ nski, S. M. 1969, Acta Astro...

  19. [19]

    Rucinski, S. M. 1993, PASP, 105, 1433, doi: 10.1086/133326

  20. [20]

    M., S´ anchez-Fern´ andez, C., & Gim´ enez,´A

    Sarro, L. M., S´ anchez-Fern´ andez, C., & Gim´ enez,´A. 2006, Astronomy & Astrophysics, 446, 395, doi: 10.1051/0004-6361:20052830

  21. [21]

    Savitzky, A., & Golay, M. J. E. 1964, Analytical Chemistry, 36, 1627, doi: 10.1021/ac60214a047

  22. [22]

    2025, PASP, 137, 044503, doi: 10.1088/1538-3873/adc5a2

    Shan, Y., Chen, J., Zhang, Z., et al. 2025, PASP, 137, 044503, doi: 10.1088/1538-3873/adc5a2

  23. [23]

    W., Prˇ sa, A., Welsh, W

    Slawson, R. W., Prˇ sa, A., Welsh, W. F., et al. 2011, AJ, 142, 160, doi: 10.1088/0004-6256/142/5/160 26 Soszy´ nski, I., Pawlak, M., Pietrukowicz, P., et al. 2016, Acta Astronomica, 66, 405, doi: 10.48550/arXiv.1701.03105

  24. [24]

    M., Torres G., Zejda M., eds, Astronomical Society of the Pacific Conference Series Vol

    Southworth, J. 2015, in Astronomical Society of the Pacific Conference Series, Vol. 496, Living Together: Planets, Host Stars and Binaries, ed. S. M. Rucinski, G. Torres, & M. Zejda, 164, doi: 10.48550/arXiv.1411.1219 Von Zeipel, H. 1924, MNRAS, 84, 665, doi: 10.1093/mnras/84.9.665

  25. [25]

    E., Devinney E

    Wilson, R. E., & Devinney, E. J. 1971, ApJ, 166, 605, doi: 10.1086/150986

  26. [26]

    E., Devinney, E

    Wilson, R. E., Devinney, E. J., & Van Hamme, W. 2020, WD: Wilson-Devinney binary star modeling, http://ascl.net/2004.004

  27. [27]

    2019, MNRAS, 489, 1644, doi: 10.1093/mnras/stz2137

    Windemuth, D., Agol, E., Ali, A., & Kiefer, F. 2019, MNRAS, 489, 1644, doi: 10.1093/mnras/stz2137

  28. [28]

    Xiong, J., Ding, X., Li, J., et al. 2024, ApJS, 270, 20, doi: 10.3847/1538-4365/ad0ceb 27 APPENDIX A.FORMULATION FOR EVALUATION METRICS To quantify the reliability of our predictions for uncharacterized systems, we formulated confidence scores based on the consensus of the ensemble. For continuous regression parameters (q, i, T e2 /Te1 ,Ω 1,Ω 2), we utili...