mega hub Mixed citations

Machine Learning 45(1), 5–32 (Oct 2001)

Leo Breiman · 1992 · Machine Learning · DOI 10.1023/a:1010933404324

Mixed citation behavior. Most common role is background (55%).

90 Pith papers citing it

110k external citations · Crossref

Background 55% of classified citations

open at publisher browse 90 citing papers more from Leo Breiman

hub tools

JSON dossier citing papers JSON publisher DOI

citation-role summary

background 12 method 7 baseline 1

citation-polarity summary

background 11 use method 7 baseline 1 unclear 1

authors

Leo Breiman

mega hub controls

export citing contexts JSON export graph JSON export full bundle JSON annotated reader queued

Recognition alignment

counterfactual ablation

If this work disappeared, these are the nearest dependency candidates in Pith, weighted toward method, dataset, baseline, and extension contexts where available. This is a structural signal, not a retraction verdict.

co-cited works

representative citing papers

FLOATBench: A Dataset and Benchmark for Floating Offshore Wind Turbine Tower Fatigue

cs.AI · 2026-05-25 · unverdicted · novelty 8.0

FLOATBench is a tabular benchmark dataset with 582,120 fatigue labels from 19,404 OpenFAST simulations of three 22 MW FOWT towers, featuring alpha-shape regime partitioning and three evaluation protocols for surrogate models.

TSFMAudit: Data Contamination Auditing in Forecasting Time Series Foundation Models

cs.LG · 2026-05-24 · unverdicted · novelty 8.0

TSFMAudit detects pretraining contamination in time series foundation models via probe adaptation dynamics (faster loss drop, smaller backbone shift), tested on 6 models and 187 datasets against 10 LLM-derived baselines.

Self-Stigma Is Not a Monolith, but Generic Empathy Is: Persona-Conditioned LLM Support for People Who Use Drugs

cs.CL · 2026-06-22 · unverdicted · novelty 7.0 · 2 refs

Four self-stigma personas identified via LPA on 1,174 Reddit users; persona-conditioned LLMs achieve targeted shifts but experts prefer generic empathy baselines.

Polarisation and Faraday rotation measure imaging at metre wavelengths with sub-arcsecond resolution: a foundational calibration strategy

astro-ph.IM · 2026-06-16 · unverdicted · novelty 7.0

A calibration strategy using full-Jones corrections with an in-field unpolarised calibrator and visibility-based multi-epoch alignment enables sub-arcsecond polarimetric imaging with LOFAR at metre wavelengths.

TIDAL: Recovering Temporal Phase for Cloud Block Storage Placement from LLM-Derived Semantics

cs.OS · 2026-05-18 · unverdicted · novelty 7.0

TIDAL recovers temporal phase signals from LLM-derived semantics of provisioning metadata to enable complementary CVD placement, reducing overload frequency by 79.1% on production traces.

TabPFN-MT: A Natively Multitask In-Context Learner for Tabular Data

cs.LG · 2026-05-16 · unverdicted · novelty 7.0

TabPFN-MT is a multitask in-context learner for tabular data that sets a new state-of-the-art on deep multitask learning for datasets under 1000 samples while reducing inference cost from O(T) to O(1) passes.

The Nova Synthetic Data Base: A Principal Component/AI Analysis of Novae Synoptic Spectra

astro-ph.SR · 2026-05-14 · unverdicted · novelty 7.0

Presents the first public synthetic spectra database for novae and demonstrates a PCA/AI framework for retrieving physical properties from limited spectral data as a proof of concept for future surveys.

MulTaBench: Benchmarking Multimodal Tabular Learning with Text and Image

cs.LG · 2026-05-11 · unverdicted · novelty 7.0

MulTaBench is a new collection of 40 image-tabular and text-tabular datasets designed to test target-aware representation tuning in multimodal tabular models.

LLM-guided Semi-Supervised Approaches for Social Media Crisis Data Classification

cs.AI · 2026-05-08 · conditional · novelty 7.0

LG-CoTrain, an LLM-guided co-training method, outperforms classical semi-supervised baselines for crisis tweet classification in low-resource settings with 5-25 labeled examples per class.

Intrinsic effective sample size for manifold-valued Markov chain Monte Carlo via kernel discrepancy

stat.ML · 2026-05-05 · unverdicted · novelty 7.0

An intrinsic effective sample size for manifold MCMC is defined via kernel discrepancy as the number of independent draws yielding equivalent expected squared discrepancy to the target.

Evaluating LLMs on Large-Scale Graph Property Estimation via Random Walks

cs.LG · 2026-05-02 · unverdicted · novelty 7.0

EstGraph benchmark evaluates LLMs on estimating properties of very large graphs from random-walk samples that fit in context limits.

Profile Likelihood Inference for Anisotropic Hyperbolic Wrapped Normal Models on Hyperbolic Space

math.ST · 2026-05-01 · unverdicted · novelty 7.0

The profile maximum likelihood estimator for the location in anisotropic hyperbolic wrapped normal models is strongly consistent, asymptotically normal, and attains the Hájek-Le Cam minimax lower bound under squared geodesic loss.

SynQL: A Controllable and Scalable Rule-Based Framework for SQL Workload Synthesis for Performance Benchmarking

cs.DB · 2026-04-09 · unverdicted · novelty 7.0

SynQL synthesizes diverse, execution-ready SQL workloads by deterministically traversing foreign-key graphs to populate ASTs, yielding high topological entropy and cost-model training data with R² ≥ 0.79 on held-out sets.

A Perfect Storm: First-Nature Geography and Economic Development

econ.GN · 2024-08-01 · unverdicted · novelty 7.0

A 1825 storm created a new sea connection in Denmark, producing a 27 percent population increase (elasticity 1.6 to market access) driven by fertility and occupational change toward fishing and manufacturing, with symmetric medieval declines after waterway closure.

A Technical Typology of AI Systems in Public Administration

cs.CY · 2026-06-30 · unverdicted · novelty 6.0

The paper defines five AI system categories for public administration and reports that 55% of 91 recent papers leave the system type underspecified while 31% study one type but motivate with another.

A welding penetration prediction model for laser welding process based on self-supervised learning using physics-informed neural networks

cs.CV · 2026-06-24 · unverdicted · novelty 6.0

SimPhysNet achieves 96.06% accuracy classifying laser welding penetration states using self-supervised contrastive learning with a physics-informed neural network and prototypical networks on only 200 labeled images.

Are We Lost in the Woods? Detecting Silent Semantic Faults for Random Forest Classifiers with Data-informed Static Analysis

cs.SE · 2026-06-05 · unverdicted · novelty 6.0

dille detects silent semantic faults in random forest ML pipelines with 91% precision via data-informed static analysis on Kaggle notebooks, finding 12-18% of scripts affected.

Reactivity-Informed Machine Learning for Performance Prediction and Design Space Exploration of Alkali-Activated Slag

cond-mat.mtrl-sci · 2026-06-04 · unverdicted · novelty 6.0

Machine learning on the largest curated alkali-activated slag dataset shows that average metal oxide dissociation energy serves as a compact, physically interpretable reactivity descriptor enabling strength prediction and low-emission design space exploration.

Decision-Path Patterns as Tree Reliability Signals: Path-based Adaptive Weighting for Random Forest Classification

cs.LG · 2026-05-20 · unverdicted · novelty 6.0

Path-based adaptive weighting of random forest trees via decision path patterns delivers statistically significant accuracy gains on 36 binary classification benchmarks with minimal class-recall regression.

Skew-adaptive conformal prediction

stat.ML · 2026-05-15 · unverdicted · novelty 6.0

Develops a skew-adaptive split conformal prediction method that learns local skewness via a gauge-derived conformity score and an asinh residual model while preserving marginal validity under exchangeability.

Neural Point-Forms

cs.LG · 2026-05-15 · unverdicted · novelty 6.0

Neural point-forms are introduced as permutation-invariant neural layers that output learned form-comparison matrices for point clouds, with a claimed consistency proof under sampling and manifold assumptions and competitive results on synthetic and biological data.

Nonparametric inference for sublevel-set probabilities of conditional average treatment effect functions

stat.ME · 2026-05-14 · unverdicted · novelty 6.0

Develops Grenander-type and debiased machine learning estimators for the sublevel-set probability curve of the CATE function, shown to be non-pathwise differentiable, along with its piecewise linear approximation.

Semantic Feature Segmentation for Interpretable Predictive Maintenance in Complex Systems

cs.AI · 2026-05-14 · unverdicted · novelty 6.0

Semantic segmentation decomposes monitoring features into canonical and residual components that concentrate fault-predictive information while preserving operational meaning in predictive maintenance.

cs.CL · 2026-05-01 · unverdicted · novelty 6.0

Machine translation preserves embedding similarity structure for ten languages but distorts it for four in the Manifesto Corpus, via a new non-inferiority testing framework.

citing papers explorer

Showing 26 of 26 citing papers after filters.

TSFMAudit: Data Contamination Auditing in Forecasting Time Series Foundation Models cs.LG · 2026-05-24 · unverdicted · none · ref 3
TSFMAudit detects pretraining contamination in time series foundation models via probe adaptation dynamics (faster loss drop, smaller backbone shift), tested on 6 models and 187 datasets against 10 LLM-derived baselines.
TabPFN-MT: A Natively Multitask In-Context Learner for Tabular Data cs.LG · 2026-05-16 · unverdicted · none · ref 42
TabPFN-MT is a multitask in-context learner for tabular data that sets a new state-of-the-art on deep multitask learning for datasets under 1000 samples while reducing inference cost from O(T) to O(1) passes.
MulTaBench: Benchmarking Multimodal Tabular Learning with Text and Image cs.LG · 2026-05-11 · unverdicted · none · ref 9
MulTaBench is a new collection of 40 image-tabular and text-tabular datasets designed to test target-aware representation tuning in multimodal tabular models.
Evaluating LLMs on Large-Scale Graph Property Estimation via Random Walks cs.LG · 2026-05-02 · unverdicted · none · ref 119
EstGraph benchmark evaluates LLMs on estimating properties of very large graphs from random-walk samples that fit in context limits.
Decision-Path Patterns as Tree Reliability Signals: Path-based Adaptive Weighting for Random Forest Classification cs.LG · 2026-05-20 · unverdicted · none · ref 5
Path-based adaptive weighting of random forest trees via decision path patterns delivers statistically significant accuracy gains on 36 binary classification benchmarks with minimal class-recall regression.
Neural Point-Forms cs.LG · 2026-05-15 · unverdicted · none · ref 65
Neural point-forms are introduced as permutation-invariant neural layers that output learned form-comparison matrices for point clouds, with a claimed consistency proof under sampling and manifold assumptions and competitive results on synthetic and biological data.
RCProb: Probabilistic Rule Extraction for Efficient Simplification of Tree Ensembles cs.LG · 2026-04-28 · unverdicted · none · ref 25
RCProb uses Dirichlet-smoothed class priors and Beta-smoothed condition likelihoods in a Naive Bayes formulation to extract rules from tree ensembles approximately 22 times faster than RuleCOSI+ while maintaining competitive accuracy and producing more compact rule sets on 33 benchmark datasets.
FETS Benchmark: Foundation Models Outperform Dataset-specific Machine Learning in Energy Time Series Forecasting cs.LG · 2026-04-24 · unverdicted · none · ref 26
Foundation models outperform dataset-specific machine learning in energy time series forecasting across 54 datasets in 9 categories.
Cluster-Specific Localized Drift Detection for Efficient Batch Model Adaptation under Controlled Distribution Shift cs.LG · 2026-06-20 · unverdicted · none · ref 7
A cluster-induced distribution shift simulation framework is proposed and used to evaluate six batch adaptation strategies including cluster-local ADWIN on five benchmark datasets.
SEAGAN: domain-Specific and Edge-Aware Graph Attention Network for Dynamic Plant Processes cs.LG · 2026-06-17 · unverdicted · none · ref 46
SEAGAN applies a domain-specific graph attention network to classify limitation states in A-Ci curves, achieving F1-score 0.857 and accuracy 0.882 on synthetic data with known ground truth.
Controllable Molecular Generative Foundation Models cs.LG · 2026-05-14 · unverdicted · none · ref 35
CoMole combines motif-aware graph diffusion with RL policy optimization to deliver controllable molecular generation that outperforms baselines on nine targets across materials and drug benchmarks while keeping high validity.
Proposal and study of statistical features for string similarity computation and classification cs.LG · 2026-05-14 · unverdicted · none · ref 1
Adapts COM and RLM statistical features from visual computing to string similarity, outperforming distance-based and other measures in synthetic tests (P<0.001 in 3/4 cases) and achieving best results on a plagiarism dataset.
Knowledge-Data Dually Driven Paradigm for Accurate Landslide Susceptibility Prediction under Data-Scarce Conditions Using Geomorphic Priors and Tabular Foundation Model cs.LG · 2026-04-28 · unverdicted · none · ref 5
A knowledge-data dual paradigm using geomorphic priors and a tabular foundation model achieves baseline-level landslide susceptibility prediction accuracy with only 30% of typical data in tested regions.
Interpretable Quantile Regression by Optimal Decision Trees cs.LG · 2026-04-22 · unverdicted · none · ref 2
A novel algorithm learns sets of optimal quantile regression trees to predict full conditional distributions interpretably and efficiently.
A Comparative Analysis of Machine Learning Algorithms for Multi-Task Prediction of the Parameters of the Pectin Hydrolysis--Extraction Process cs.LG · 2026-05-30 · unverdicted · none · ref 2
CatBoost achieved the highest average R-squared value of about 0.946 in a multi-task regression task for pectin process parameters, with raw material type identified as the most influential input feature.
Interpretable Policy Distillation for Power Grid Topology Control cs.LG · 2026-05-30 · unverdicted · none · ref 3
PPO policy for grid topology control is distilled into decision trees and random forests that outperform the teacher on reward and survival time with lower inference cost and high interpretability.
AI-Guided Design and Optimization of Graphite-Based Anodes via Iterative Experimental Feedback cs.LG · 2026-05-29 · unverdicted · none · ref 13
Iterative AI-guided optimization of graphite anodes via the Citrine Platform raised fabrication success to 100 percent, high-capacity cell fraction to 84.8 percent, and capacity retention to 97.3 percent.
Ti-iLSTM: A TinyDL Approach for Logic-Level Anomaly Detection in Industrial Water Treatment Systems cs.LG · 2026-05-15 · unverdicted · none · ref 13
Ti-iLSTM optimizes LSTM for TinyDL to detect logic-layer deception anomalies in PLC-based IWTS, reporting F1=0.983 and AUC=0.998 on SWaT with validation on WADI.
On Improving Graph Neural Networks for QSAR by Pre-training on Extended-Connectivity Fingerprints cs.LG · 2026-05-11 · unverdicted · none · ref 17
Pre-training GNNs on ECFP prediction produces statistically significant QSAR gains on five of six Biogen benchmarks with OOD splits, but underperforms on heterogeneous datasets and complex endpoints like binding affinity.
Generating Synthetic Malware Samples Using Generative AI cs.LG · 2026-04-23 · conditional · none · ref 23
Opcode-sequence generative models produce synthetic malware data that raises minor-class classification accuracy by up to 60% and overall detection to 96%.
Implicit neural representations as a coordinate-based framework for continuous environmental field reconstruction from sparse ecological observations cs.LG · 2026-04-20 · unverdicted · none · ref 13
Implicit neural representations enable stable, resolution-independent reconstruction of continuous environmental fields from sparse and irregular ecological data.
Community-Based Early-Stage Chronic Kidney Disease Screening using Explainable Machine Learning for Low-Resource Settings cs.LG · 2026-01-03 · unverdicted · none · ref 63
Machine learning models trained on Bangladeshi community data achieve 89-90% balanced accuracy for early CKD detection using few accessible features, outperforming traditional screening tools and generalizing across external datasets from India, UAE, and Bangladesh.
Autoencoder Architectures for Athlete Performance Scoring from Wearable Telemetry cs.LG · 2026-06-26 · unverdicted · none · ref 10
Deep autoencoders outperform PCA and VAE variants on a composite of reconstruction MSE and interpretability metrics when reducing runner wearable data to a single latent performance score.
A machine-learning-assisted progressive digit-randomness screening framework for detecting non-random patterns in raw numerical research data cs.LG · 2026-06-05 · unverdicted · none · ref 36
FDRS combines digit frequency tests, association metrics, entropy, KL divergence, and ML models to assign risk grades to numerical datasets, showing separation between normal and irregular simulated data with high AUC.
A Proof-of-Concept Simulation-Driven Digital Twin Framework for Decision-Aware Diabetes Modeling cs.LG · 2026-05-11 · unverdicted · none · ref 29
A simulation-driven digital twin framework is shown to generate interpretable diabetes trajectories for decision-aware analysis by combining benchmark data with controlled synthetic scenarios.
STRIKE: Additive Feature-Group-Aware Stacking Framework for Credit Default Prediction cs.LG · 2026-04-19 · unverdicted · none · ref 7
STRIKE improves credit default prediction AUC-ROC by training independent models on feature groups and aggregating their outputs via a meta-learner, outperforming tree baselines and conventional stacking on three real datasets.

Machine Learning 45(1), 5–32 (Oct 2001)

hub tools

citation-role summary

citation-polarity summary

authors

mega hub controls

Recognition alignment

counterfactual ablation

co-cited works

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer