mega hub

Random forests

Leo Breiman · 2001 · Machine Learning · DOI 10.1023/a:1010933404324

33 Pith papers cite this work, alongside 110,023 external citations. Polarity classification is still indexing.

33 Pith papers citing it

110k external citations · Crossref

open at publisher browse 33 citing papers more from Leo Breiman

hub tools

JSON dossier citing papers JSON publisher DOI

citation-role summary

background 3

citation-polarity summary

background 3

authors

Leo Breiman

mega hub controls

export citing contexts JSON export graph JSON export full bundle JSON annotated reader queued

Recognition alignment

counterfactual ablation

If this work disappeared, these are the nearest dependency candidates in Pith, weighted toward method, dataset, baseline, and extension contexts where available. This is a structural signal, not a retraction verdict.

co-cited works

representative citing papers

MulTaBench: Benchmarking Multimodal Tabular Learning with Text and Image

cs.LG · 2026-05-11 · unverdicted · novelty 7.0

MulTaBench is a new collection of 40 image-tabular and text-tabular datasets designed to test target-aware representation tuning in multimodal tabular models.

LLM-guided Semi-Supervised Approaches for Social Media Crisis Data Classification

cs.AI · 2026-05-08 · conditional · novelty 7.0

LG-CoTrain, an LLM-guided co-training method, outperforms classical semi-supervised baselines for crisis tweet classification in low-resource settings with 5-25 labeled examples per class.

Intrinsic effective sample size for manifold-valued Markov chain Monte Carlo via kernel discrepancy

stat.ML · 2026-05-05 · unverdicted · novelty 7.0

An intrinsic effective sample size for manifold MCMC is defined via kernel discrepancy as the number of independent draws yielding equivalent expected squared discrepancy to the target.

Evaluating LLMs on Large-Scale Graph Property Estimation via Random Walks

cs.LG · 2026-05-02 · unverdicted · novelty 7.0

EstGraph benchmark evaluates LLMs on estimating properties of very large graphs from random-walk samples that fit in context limits.

Profile Likelihood Inference for Anisotropic Hyperbolic Wrapped Normal Models on Hyperbolic Space

math.ST · 2026-05-01 · unverdicted · novelty 7.0

The profile maximum likelihood estimator for the location in anisotropic hyperbolic wrapped normal models is strongly consistent, asymptotically normal, and attains the Hájek-Le Cam minimax lower bound under squared geodesic loss.

SynQL: A Controllable and Scalable Rule-Based Framework for SQL Workload Synthesis for Performance Benchmarking

cs.DB · 2026-04-09 · unverdicted · novelty 7.0

SynQL synthesizes diverse, execution-ready SQL workloads by deterministically traversing foreign-key graphs to populate ASTs, yielding high topological entropy and cost-model training data with R² ≥ 0.79 on held-out sets.

Reciprocal Co-Training (RCT): Coupling Gradient-Based and Non-Differentiable Models via Reinforcement Learning

cs.CL · 2026-03-24 · unverdicted · novelty 7.0

RCT couples an LLM and Random Forest via RL feedback so each augments the other's features and rewards, producing consistent gains on three medical datasets.

Semantic Feature Segmentation for Interpretable Predictive Maintenance in Complex Systems

cs.AI · 2026-05-14 · unverdicted · novelty 6.0

Semantic segmentation decomposes monitoring features into canonical and residual components that concentrate fault-predictive information while preserving operational meaning in predictive maintenance.

cs.CL · 2026-05-01 · unverdicted · novelty 6.0

Machine translation preserves embedding similarity structure for ten languages but distorts it for four in the Manifesto Corpus, via a new non-inferiority testing framework.

Vesselpose: Vessel Graph Reconstruction from Learned Voxel-wise Direction Vectors in 3D Vascular Images

cs.CV · 2026-05-01 · unverdicted · novelty 6.0

Vesselpose predicts voxel-wise direction vectors to extend the TEASAR algorithm for topologically accurate vascular graph reconstruction from 3D images.

RCProb: Probabilistic Rule Extraction for Efficient Simplification of Tree Ensembles

cs.LG · 2026-04-28 · unverdicted · novelty 6.0

RCProb uses Dirichlet-smoothed class priors and Beta-smoothed condition likelihoods in a Naive Bayes formulation to extract rules from tree ensembles approximately 22 times faster than RuleCOSI+ while maintaining competitive accuracy and producing more compact rule sets on 33 benchmark datasets.

StarCLR: Contrastive Learning Representation for Astronomical Light Curves

astro-ph.SR · 2026-04-27 · conditional · novelty 6.0

StarCLR pretrains on TESS light curves via contrastive learning on overlapping subsequences and improves variable star classification F1 scores over scratch-trained models when fine-tuned on TESS, ZTF, and Gaia.

Resource-Lean Lexicon Induction for German Dialects

cs.CL · 2026-04-26 · accept · novelty 6.0

Random forests on string similarity features outperform LLMs for German dialect lexicon induction and boost dialect information retrieval by up to 50% in recall.

FETS Benchmark: Foundation Models Outperform Dataset-specific Machine Learning in Energy Time Series Forecasting

cs.LG · 2026-04-24 · unverdicted · novelty 6.0

Foundation models outperform dataset-specific machine learning in energy time series forecasting across 54 datasets in 9 categories.

ReSS: Learning Reasoning Models for Tabular Data Prediction via Symbolic Scaffold

cs.AI · 2026-04-15 · unverdicted · novelty 6.0

ReSS uses decision-tree scaffolds to fine-tune LLMs for faithful tabular reasoning, reporting up to 10% gains over baselines on medical and financial data.

Identifying Changing-Look AGN Transitions in Light Curve Data with the Zwicky Transient Facility

astro-ph.GA · 2026-04-13 · unverdicted · novelty 6.0

A criterion of |Δg| > 0.4 mag and |Δ(g-r)| > 0.2 mag detects photometric CL-AGN transitions in 9.6% of known hosts with 1.6% false positive rate from simulations.

Mind the Gap? A Distributional Comparison of Real and Synthetic Priors for Tabular Foundation Models

cs.AI · 2026-05-07 · unverdicted · novelty 5.0

The synthetic prior for tabular foundation models covers only a narrow part of real table distributions, but this mismatch does not degrade model generalization.

Data-Efficient Indentation Size Effect Correction in Steels Using Machine Learning and Physics-Guided Augmentation

cond-mat.mtrl-sci · 2026-04-30 · unverdicted · novelty 5.0

Physics-guided data augmentation combined with neural networks enables accurate indentation size effect correction in steels from small sets of shallow nanoindentation measurements, outperforming Nix-Gao in the shallow regime.

Knowledge-Data Dually Driven Paradigm for Accurate Landslide Susceptibility Prediction under Data-Scarce Conditions Using Geomorphic Priors and Tabular Foundation Model

cs.LG · 2026-04-28 · unverdicted · novelty 5.0

A knowledge-data dual paradigm using geomorphic priors and a tabular foundation model achieves baseline-level landslide susceptibility prediction accuracy with only 30% of typical data in tested regions.

Interpretable Quantile Regression by Optimal Decision Trees

cs.LG · 2026-04-22 · unverdicted · novelty 5.0

A novel algorithm learns sets of optimal quantile regression trees to predict full conditional distributions interpretably and efficiently.

Is the `Known' Enough? An Integrated Machine Learning Framework for Eclipsing Binary Classification and Parameter Estimation Based on Well-Characterized Systems

astro-ph.SR · 2026-04-21 · conditional · novelty 5.0

An ensemble ML framework achieves 90.7% morphology classification accuracy and R² values of 0.77–0.92 for key parameters on held-out test data, with external validation against OGLE and Kepler catalogs.

The T16 Planet Hunt: 10,000 New Planet Candidates from TESS Cycle 1 and the Confirmation of a Hot Jupiter Around TIC 183374187

astro-ph.EP · 2026-04-20 · conditional · novelty 5.0

A transit search on TESS Cycle 1 full-frame images produced 10,091 new planet candidates down to T=16 mag, more than doubling the known TESS total, with one hot Jupiter confirmed by radial velocity.

Financial Dynamics and Interconnected Risk of Liquid Restaking

q-fin.GN · 2026-03-23 · unverdicted · novelty 5.0

Renzo liquid restaking revenue is primarily predicted by EigenLayer value locked, token yield, and multi-blockchain expansion, with current bridge risks not imposing systemic threats to the restaking ecosystem.

On Improving Graph Neural Networks for QSAR by Pre-training on Extended-Connectivity Fingerprints

cs.LG · 2026-05-11 · unverdicted · novelty 4.0

Pre-training GNNs on ECFP prediction produces statistically significant QSAR gains on five of six Biogen benchmarks with OOD splits, but underperforms on heterogeneous datasets and complex endpoints like binding affinity.

citing papers explorer

Showing 33 of 33 citing papers.

MulTaBench: Benchmarking Multimodal Tabular Learning with Text and Image cs.LG · 2026-05-11 · unverdicted · none · ref 9
MulTaBench is a new collection of 40 image-tabular and text-tabular datasets designed to test target-aware representation tuning in multimodal tabular models.
LLM-guided Semi-Supervised Approaches for Social Media Crisis Data Classification cs.AI · 2026-05-08 · conditional · none · ref 126
LG-CoTrain, an LLM-guided co-training method, outperforms classical semi-supervised baselines for crisis tweet classification in low-resource settings with 5-25 labeled examples per class.
Intrinsic effective sample size for manifold-valued Markov chain Monte Carlo via kernel discrepancy stat.ML · 2026-05-05 · unverdicted · none · ref 281
An intrinsic effective sample size for manifold MCMC is defined via kernel discrepancy as the number of independent draws yielding equivalent expected squared discrepancy to the target.
Evaluating LLMs on Large-Scale Graph Property Estimation via Random Walks cs.LG · 2026-05-02 · unverdicted · none · ref 119
EstGraph benchmark evaluates LLMs on estimating properties of very large graphs from random-walk samples that fit in context limits.
Profile Likelihood Inference for Anisotropic Hyperbolic Wrapped Normal Models on Hyperbolic Space math.ST · 2026-05-01 · unverdicted · none · ref 271
The profile maximum likelihood estimator for the location in anisotropic hyperbolic wrapped normal models is strongly consistent, asymptotically normal, and attains the Hájek-Le Cam minimax lower bound under squared geodesic loss.
SynQL: A Controllable and Scalable Rule-Based Framework for SQL Workload Synthesis for Performance Benchmarking cs.DB · 2026-04-09 · unverdicted · none · ref 28
SynQL synthesizes diverse, execution-ready SQL workloads by deterministically traversing foreign-key graphs to populate ASTs, yielding high topological entropy and cost-model training data with R² ≥ 0.79 on held-out sets.
Reciprocal Co-Training (RCT): Coupling Gradient-Based and Non-Differentiable Models via Reinforcement Learning cs.CL · 2026-03-24 · unverdicted · none · ref 8
RCT couples an LLM and Random Forest via RL feedback so each augments the other's features and rewards, producing consistent gains on three medical datasets.
Semantic Feature Segmentation for Interpretable Predictive Maintenance in Complex Systems cs.AI · 2026-05-14 · unverdicted · none · ref 28
Semantic segmentation decomposes monitoring features into canonical and residual components that concentrate fault-predictive information while preserving operational meaning in predictive maintenance.
Is Textual Similarity Invariant under Machine Translation? Evidence Based on the Political Manifesto Corpus cs.CL · 2026-05-01 · unverdicted · none · ref 12
Machine translation preserves embedding similarity structure for ten languages but distorts it for four in the Manifesto Corpus, via a new non-inferiority testing framework.
Vesselpose: Vessel Graph Reconstruction from Learned Voxel-wise Direction Vectors in 3D Vascular Images cs.CV · 2026-05-01 · unverdicted · none · ref 70
Vesselpose predicts voxel-wise direction vectors to extend the TEASAR algorithm for topologically accurate vascular graph reconstruction from 3D images.
RCProb: Probabilistic Rule Extraction for Efficient Simplification of Tree Ensembles cs.LG · 2026-04-28 · unverdicted · none · ref 25
RCProb uses Dirichlet-smoothed class priors and Beta-smoothed condition likelihoods in a Naive Bayes formulation to extract rules from tree ensembles approximately 22 times faster than RuleCOSI+ while maintaining competitive accuracy and producing more compact rule sets on 33 benchmark datasets.
StarCLR: Contrastive Learning Representation for Astronomical Light Curves astro-ph.SR · 2026-04-27 · conditional · none · ref 7
StarCLR pretrains on TESS light curves via contrastive learning on overlapping subsequences and improves variable star classification F1 scores over scratch-trained models when fine-tuned on TESS, ZTF, and Gaia.
Resource-Lean Lexicon Induction for German Dialects cs.CL · 2026-04-26 · accept · none · ref 9
Random forests on string similarity features outperform LLMs for German dialect lexicon induction and boost dialect information retrieval by up to 50% in recall.
FETS Benchmark: Foundation Models Outperform Dataset-specific Machine Learning in Energy Time Series Forecasting cs.LG · 2026-04-24 · unverdicted · none · ref 26
Foundation models outperform dataset-specific machine learning in energy time series forecasting across 54 datasets in 9 categories.
ReSS: Learning Reasoning Models for Tabular Data Prediction via Symbolic Scaffold cs.AI · 2026-04-15 · unverdicted · none · ref 8
ReSS uses decision-tree scaffolds to fine-tune LLMs for faithful tabular reasoning, reporting up to 10% gains over baselines on medical and financial data.
Identifying Changing-Look AGN Transitions in Light Curve Data with the Zwicky Transient Facility astro-ph.GA · 2026-04-13 · unverdicted · none · ref 7
A criterion of |Δg| > 0.4 mag and |Δ(g-r)| > 0.2 mag detects photometric CL-AGN transitions in 9.6% of known hosts with 1.6% false positive rate from simulations.
Mind the Gap? A Distributional Comparison of Real and Synthetic Priors for Tabular Foundation Models cs.AI · 2026-05-07 · unverdicted · none · ref 6
The synthetic prior for tabular foundation models covers only a narrow part of real table distributions, but this mismatch does not degrade model generalization.
Data-Efficient Indentation Size Effect Correction in Steels Using Machine Learning and Physics-Guided Augmentation cond-mat.mtrl-sci · 2026-04-30 · unverdicted · none · ref 49
Physics-guided data augmentation combined with neural networks enables accurate indentation size effect correction in steels from small sets of shallow nanoindentation measurements, outperforming Nix-Gao in the shallow regime.
Knowledge-Data Dually Driven Paradigm for Accurate Landslide Susceptibility Prediction under Data-Scarce Conditions Using Geomorphic Priors and Tabular Foundation Model cs.LG · 2026-04-28 · unverdicted · none · ref 5
A knowledge-data dual paradigm using geomorphic priors and a tabular foundation model achieves baseline-level landslide susceptibility prediction accuracy with only 30% of typical data in tested regions.
Interpretable Quantile Regression by Optimal Decision Trees cs.LG · 2026-04-22 · unverdicted · none · ref 2
A novel algorithm learns sets of optimal quantile regression trees to predict full conditional distributions interpretably and efficiently.
Is the `Known' Enough? An Integrated Machine Learning Framework for Eclipsing Binary Classification and Parameter Estimation Based on Well-Characterized Systems astro-ph.SR · 2026-04-21 · conditional · none · ref 3
An ensemble ML framework achieves 90.7% morphology classification accuracy and R² values of 0.77–0.92 for key parameters on held-out test data, with external validation against OGLE and Kepler catalogs.
The T16 Planet Hunt: 10,000 New Planet Candidates from TESS Cycle 1 and the Confirmation of a Hot Jupiter Around TIC 183374187 astro-ph.EP · 2026-04-20 · conditional · none · ref 10
A transit search on TESS Cycle 1 full-frame images produced 10,091 new planet candidates down to T=16 mag, more than doubling the known TESS total, with one hot Jupiter confirmed by radial velocity.
Financial Dynamics and Interconnected Risk of Liquid Restaking q-fin.GN · 2026-03-23 · unverdicted · none · ref 21
Renzo liquid restaking revenue is primarily predicted by EigenLayer value locked, token yield, and multi-blockchain expansion, with current bridge risks not imposing systemic threats to the restaking ecosystem.
On Improving Graph Neural Networks for QSAR by Pre-training on Extended-Connectivity Fingerprints cs.LG · 2026-05-11 · unverdicted · none · ref 17
Pre-training GNNs on ECFP prediction produces statistically significant QSAR gains on five of six Biogen benchmarks with OOD splits, but underperforms on heterogeneous datasets and complex endpoints like binding affinity.
Generating Synthetic Malware Samples Using Generative AI cs.LG · 2026-04-23 · conditional · none · ref 23
Opcode-sequence generative models produce synthetic malware data that raises minor-class classification accuracy by up to 60% and overall detection to 96%.
Predicting Redshift in Seyfert Galaxies Using Machine Learning astro-ph.GA · 2026-04-20 · conditional · none · ref 1
Random Forest regression on combined optical plus mid-infrared colors yields NMAD of 0.0188, R-squared of 0.9561, and 0.294 percent outliers for photometric redshifts in 23,797 Seyfert II galaxies selected from SDSS and WISE.
Implicit neural representations as a coordinate-based framework for continuous environmental field reconstruction from sparse ecological observations cs.LG · 2026-04-20 · unverdicted · none · ref 13
Implicit neural representations enable stable, resolution-independent reconstruction of continuous environmental fields from sparse and irregular ecological data.
Impact of Validation Strategy on Machine Learning Performance in EEG-Based Alcoholism Classification eess.SP · 2026-04-11 · unverdicted · none · ref 17
Nested cross-validation reveals optimistic bias in standard validation for EEG alcoholism classification, with AdaBoost reaching 78.3% accuracy and most model differences not statistically significant per McNemar's test.
A Proof-of-Concept Simulation-Driven Digital Twin Framework for Decision-Aware Diabetes Modeling cs.LG · 2026-05-11 · unverdicted · none · ref 29
A simulation-driven digital twin framework is shown to generate interpretable diabetes trajectories for decision-aware analysis by combining benchmark data with controlled synthetic scenarios.
An Explainable Unsupervised-to-Supervised Machine Learning Framework for Dietary Pattern Discovery Using UK National Dietary Survey Data q-bio.QM · 2026-05-07 · unverdicted · none · ref 26
An unsupervised-to-supervised ML pipeline on UK NDNS data discovers four dietary patterns, reproduces them with macro-F1 0.963 using a surrogate classifier, and interprets them via SHAP for potential clinical use.
STRIKE: Additive Feature-Group-Aware Stacking Framework for Credit Default Prediction cs.LG · 2026-04-19 · unverdicted · none · ref 7
STRIKE improves credit default prediction AUC-ROC by training independent models on feature groups and aggregating their outputs via a meta-learner, outperforming tree baselines and conventional stacking on three real datasets.
fastml: Guarded Resampling Workflows for Safer Automated Machine Learning in R stat.CO · 2026-04-06 · unverdicted · none · ref 4
fastml is an R package that enforces leakage-free preprocessing through guarded resampling and provides a unified interface for safer automated ML including survival analysis.
Model Internal Sleuthing: Finding Lexical Identity and Inflectional Features in Modern Language Models cs.CL · 2025-06-02 · unreviewed · ref 4

Random forests

hub tools

citation-role summary

citation-polarity summary

authors

mega hub controls

Recognition alignment

counterfactual ablation

co-cited works

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer