dille detects silent semantic faults in random forest ML pipelines with 91% precision via data-informed static analysis on Kaggle notebooks, finding 12-18% of scripts affected.
Title resolution pending
5 Pith papers cite this work. Polarity classification is still indexing.
representative citing papers
A triplet-based plateau search algorithm is proposed to adaptively determine a near-minimal number of trees for random forests by monitoring relative OOB score changes across forest size triplets, removing n_trees from the TPE search space.
Path-based adaptive weighting of random forest trees via decision path patterns delivers statistically significant accuracy gains on 36 binary classification benchmarks with minimal class-recall regression.
A two-step framework combines stacked hurdle random forest models for local severity prediction with semi-parametric spatio-temporal modeling to reconstruct large-scale disease dynamics from imperfect indicators, demonstrated on sugar beet yellows in France.
Random forest models classify VLASS DRAGNs by artifact count with 97 percent weighted F1 score, enabling extraction of a high-completeness artifact-free catalog.
citing papers explorer
-
Are We Lost in the Woods? Detecting Silent Semantic Faults for Random Forest Classifiers with Data-informed Static Analysis
dille detects silent semantic faults in random forest ML pipelines with 91% precision via data-informed static analysis on Kaggle notebooks, finding 12-18% of scripts affected.
-
How Many Trees in a Random Forest? A Revisited Approach with Plateau Search and Optuna Integration
A triplet-based plateau search algorithm is proposed to adaptively determine a near-minimal number of trees for random forests by monitoring relative OOB score changes across forest size triplets, removing n_trees from the TPE search space.
-
Decision-Path Patterns as Tree Reliability Signals: Path-based Adaptive Weighting for Random Forest Classification
Path-based adaptive weighting of random forest trees via decision path patterns delivers statistically significant accuracy gains on 36 binary classification benchmarks with minimal class-recall regression.
-
Predicting disease severity and large-scale spread from coupled severity measurements and imperfect indicators: Application to beet yellows
A two-step framework combines stacked hurdle random forest models for local severity prediction with semi-parametric spatio-temporal modeling to reconstruct large-scale disease dynamics from imperfect indicators, demonstrated on sugar beet yellows in France.
-
DRAGNs in the Forest: Identifying Artifacts with Random Forest Models in the VLASS DRAGNs Catalog
Random forest models classify VLASS DRAGNs by artifact count with 97 percent weighted F1 score, enabling extraction of a high-completeness artifact-free catalog.