Four self-stigma personas identified via LPA on 1,174 Reddit users; persona-conditioned LLMs achieve targeted shifts but experts prefer generic empathy baselines.
mega hub Mixed citations
Machine Learning 45(1), 5–32 (Oct 2001)
Mixed citation behavior. Most common role is background (55%).
hub tools
citation-role summary
citation-polarity summary
authors
mega hub controls
Recognition alignment
counterfactual ablation
co-cited works
representative citing papers
A calibration strategy using full-Jones corrections with an in-field unpolarised calibrator and visibility-based multi-epoch alignment enables sub-arcsecond polarimetric imaging with LOFAR at metre wavelengths.
TIDAL recovers temporal phase signals from LLM-derived semantics of provisioning metadata to enable complementary CVD placement, reducing overload frequency by 79.1% on production traces.
TabPFN-MT is a multitask in-context learner for tabular data that sets a new state-of-the-art on deep multitask learning for datasets under 1000 samples while reducing inference cost from O(T) to O(1) passes.
Presents the first public synthetic spectra database for novae and demonstrates a PCA/AI framework for retrieving physical properties from limited spectral data as a proof of concept for future surveys.
MulTaBench is a new collection of 40 image-tabular and text-tabular datasets designed to test target-aware representation tuning in multimodal tabular models.
LG-CoTrain, an LLM-guided co-training method, outperforms classical semi-supervised baselines for crisis tweet classification in low-resource settings with 5-25 labeled examples per class.
An intrinsic effective sample size for manifold MCMC is defined via kernel discrepancy as the number of independent draws yielding equivalent expected squared discrepancy to the target.
EstGraph benchmark evaluates LLMs on estimating properties of very large graphs from random-walk samples that fit in context limits.
The profile maximum likelihood estimator for the location in anisotropic hyperbolic wrapped normal models is strongly consistent, asymptotically normal, and attains the Hájek-Le Cam minimax lower bound under squared geodesic loss.
SynQL synthesizes diverse, execution-ready SQL workloads by deterministically traversing foreign-key graphs to populate ASTs, yielding high topological entropy and cost-model training data with R² ≥ 0.79 on held-out sets.
A 1825 storm created a new sea connection in Denmark, producing a 27 percent population increase (elasticity 1.6 to market access) driven by fertility and occupational change toward fishing and manufacturing, with symmetric medieval declines after waterway closure.
SimPhysNet achieves 96.06% accuracy classifying laser welding penetration states using self-supervised contrastive learning with a physics-informed neural network and prototypical networks on only 200 labeled images.
dille detects silent semantic faults in random forest ML pipelines with 91% precision via data-informed static analysis on Kaggle notebooks, finding 12-18% of scripts affected.
Machine learning on the largest curated alkali-activated slag dataset shows that average metal oxide dissociation energy serves as a compact, physically interpretable reactivity descriptor enabling strength prediction and low-emission design space exploration.
Develops a skew-adaptive split conformal prediction method that learns local skewness via a gauge-derived conformity score and an asinh residual model while preserving marginal validity under exchangeability.
Neural point-forms are introduced as permutation-invariant neural layers that output learned form-comparison matrices for point clouds, with a claimed consistency proof under sampling and manifold assumptions and competitive results on synthetic and biological data.
Develops Grenander-type and debiased machine learning estimators for the sublevel-set probability curve of the CATE function, shown to be non-pathwise differentiable, along with its piecewise linear approximation.
Semantic segmentation decomposes monitoring features into canonical and residual components that concentrate fault-predictive information while preserving operational meaning in predictive maintenance.
Machine translation preserves embedding similarity structure for ten languages but distorts it for four in the Manifesto Corpus, via a new non-inferiority testing framework.
Vesselpose predicts voxel-wise direction vectors to extend the TEASAR algorithm for topologically accurate vascular graph reconstruction from 3D images.
RCProb uses Dirichlet-smoothed class priors and Beta-smoothed condition likelihoods in a Naive Bayes formulation to extract rules from tree ensembles approximately 22 times faster than RuleCOSI+ while maintaining competitive accuracy and producing more compact rule sets on 33 benchmark datasets.
StarCLR pretrains on TESS light curves via contrastive learning on overlapping subsequences and improves variable star classification F1 scores over scratch-trained models when fine-tuned on TESS, ZTF, and Gaia.
Random forests on string similarity features outperform LLMs for German dialect lexicon induction and boost dialect information retrieval by up to 50% in recall.
citing papers explorer
-
Data-driven prediction of vortex-induced vibration response of marine risers subjected to three-dimensional current
Random forest regression trained on clustered 3D-current VIV experiments predicts riser response statistics and is compared against the VIVANA-FD semi-empirical tool.
-
Application of Deep Learning to Jet Charge Discrimination
Graph neural network achieves AUC of 0.883 for up versus anti-up quark jet charge discrimination in controlled QCD simulations.
-
Analysing drivers and interdependencies in European electricity markets using XAI
DNNs plus SHAP/SSHAP applied to 39 European bidding zones identify solar and gas as key price drivers and simulate a single-price EU market.
-
A Hybrid Quantum-Classical Approach for Melt Pool Prediction in Laser Powder Bed Fusion
Hybrid quantum-classical model with quantum feature encoding and clustering outperforms classical neural networks for LPBF melt pool prediction.
-
Enhancing Collective Self-Consumption through Water Storage Heater Flexibility
Simulation of water heater flexibility in a 41-household CSC community yields 70€/year per household savings with 6% higher self-consumption and 22% higher self-production; real-world deployment examines technical performance and user acceptance.
-
A Comparative Analysis of Machine Learning Algorithms for Multi-Task Prediction of the Parameters of the Pectin Hydrolysis--Extraction Process
CatBoost achieved the highest average R-squared value of about 0.946 in a multi-task regression task for pectin process parameters, with raw material type identified as the most influential input feature.
-
Interpretable Policy Distillation for Power Grid Topology Control
PPO policy for grid topology control is distilled into decision trees and random forests that outperform the teacher on reward and survival time with lower inference cost and high interpretability.
-
AI-Guided Design and Optimization of Graphite-Based Anodes via Iterative Experimental Feedback
Iterative AI-guided optimization of graphite anodes via the Citrine Platform raised fabrication success to 100 percent, high-capacity cell fraction to 84.8 percent, and capacity retention to 97.3 percent.
-
Ti-iLSTM: A TinyDL Approach for Logic-Level Anomaly Detection in Industrial Water Treatment Systems
Ti-iLSTM optimizes LSTM for TinyDL to detect logic-layer deception anomalies in PLC-based IWTS, reporting F1=0.983 and AUC=0.998 on SWaT with validation on WADI.
-
On Improving Graph Neural Networks for QSAR by Pre-training on Extended-Connectivity Fingerprints
Pre-training GNNs on ECFP prediction produces statistically significant QSAR gains on five of six Biogen benchmarks with OOD splits, but underperforms on heterogeneous datasets and complex endpoints like binding affinity.
-
Generating Synthetic Malware Samples Using Generative AI
Opcode-sequence generative models produce synthetic malware data that raises minor-class classification accuracy by up to 60% and overall detection to 96%.
-
Predicting Redshift in Seyfert Galaxies Using Machine Learning
Random Forest regression on combined optical plus mid-infrared colors yields NMAD of 0.0188, R-squared of 0.9561, and 0.294 percent outliers for photometric redshifts in 23,797 Seyfert II galaxies selected from SDSS and WISE.
-
Implicit neural representations as a coordinate-based framework for continuous environmental field reconstruction from sparse ecological observations
Implicit neural representations enable stable, resolution-independent reconstruction of continuous environmental fields from sparse and irregular ecological data.
-
Impact of Validation Strategy on Machine Learning Performance in EEG-Based Alcoholism Classification
Nested cross-validation reveals optimistic bias in standard validation for EEG alcoholism classification, with AdaBoost reaching 78.3% accuracy and most model differences not statistically significant per McNemar's test.
-
Comparative Evaluation of Machine Learning Models for Predicting Donor Kidney Discard
On 4080 German deceased donors, an ensemble ML model reached MCC 0.76 for kidney discard prediction, with standardized preprocessing and feature selection proving more important than the specific algorithm chosen.
-
Community-Based Early-Stage Chronic Kidney Disease Screening using Explainable Machine Learning for Low-Resource Settings
Machine learning models trained on Bangladeshi community data achieve 89-90% balanced accuracy for early CKD detection using few accessible features, outperforming traditional screening tools and generalizing across external datasets from India, UAE, and Bangladesh.
-
Harvesting AlphaEarth: Benchmarking the Geospatial Foundation Model for Agricultural Downstream Tasks
AEF embeddings perform competitively with RS models for local agricultural tasks but show limited spatial transferability, time sensitivity, and interpretability.
-
Finding Quasars behind the Galactic Plane. IV. Candidate Selection from Chandra with Random Forest
A Random Forest classifier on Chandra, Gaia, and CatWISE data identifies 1060 new quasar candidates behind the Galactic plane, with two spectroscopically confirmed at z~1.1-1.3.
-
An aggregate learning approach for interpretable semi-supervised population prediction and disaggregation using ancillary data
An aggregate learning approach with a simple interpretable model achieves state-of-the-art or better performance on population disaggregation using ancillary data.
-
Autoencoder Architectures for Athlete Performance Scoring from Wearable Telemetry
Deep autoencoders outperform PCA and VAE variants on a composite of reconstruction MSE and interpretability metrics when reducing runner wearable data to a single latent performance score.
-
Machine Learning Approaches for Improved Scalability of Metallic Magnetic Calorimeters
Machine learning methods are explored for pulse classification, artifact rejection, and shape analysis in metallic magnetic calorimeters to improve scalability over traditional signal processing.
-
Modelling magnetic material properties with uncertainty-aware neural networks
Uncertainty-aware neural networks using Gaussian negative log-likelihood and dropout are applied to predict intrinsic magnetic properties and coercivity via graph neural networks in permanent magnet research.
-
A Guide to Estimating Conditional Average Treatment Effects in Competing Risks Settings
Compares six meta-learners (Cox/RSF risk models paired with elastic net/RF CATE models) via simulations differing in hazard complexity and censoring, and releases the R package crsurvlearners.
-
A machine-learning-assisted progressive digit-randomness screening framework for detecting non-random patterns in raw numerical research data
FDRS combines digit frequency tests, association metrics, entropy, KL divergence, and ML models to assign risk grades to numerical datasets, showing separation between normal and irregular simulated data with high AUC.
-
A Proof-of-Concept Simulation-Driven Digital Twin Framework for Decision-Aware Diabetes Modeling
A simulation-driven digital twin framework is shown to generate interpretable diabetes trajectories for decision-aware analysis by combining benchmark data with controlled synthetic scenarios.
-
An Explainable Unsupervised-to-Supervised Machine Learning Framework for Dietary Pattern Discovery Using UK National Dietary Survey Data
An unsupervised-to-supervised ML pipeline on UK NDNS data discovers four dietary patterns, reproduces them with macro-F1 0.963 using a surrogate classifier, and interprets them via SHAP for potential clinical use.
-
A Machine Learning Framework for EEG-Based Prediction of Treatment Efficacy in Chronic Neck Pain
A preprocessing pipeline for resting-state and motor-task EEG is described to support future machine learning models that predict treatment efficacy in chronic neck pain.
-
STRIKE: Additive Feature-Group-Aware Stacking Framework for Credit Default Prediction
STRIKE improves credit default prediction AUC-ROC by training independent models on feature groups and aggregating their outputs via a meta-learner, outperforming tree baselines and conventional stacking on three real datasets.
-
fastml: Guarded Resampling Workflows for Safer Automated Machine Learning in R
fastml is an R package that enforces leakage-free preprocessing through guarded resampling and provides a unified interface for safer automated ML including survival analysis.
-
Towards Accurate and Efficient Waste Image Classification: A Hybrid Deep Learning and Machine Learning Approach
A hybrid deep learning plus classical ML pipeline for waste image classification reaches up to 100% accuracy on TrashNet and a corrected household dataset while cutting feature dimensionality by over 95%.
- Missing Links in Public Email and Covert Networks: A Comparative Evaluation of Link Prediction, Hyperlink Prediction, and ERGM Estimation
- Controllable Molecular Generative Foundation Models
- Reciprocal Co-Training (RCT): Coupling Gradient-Based and Non-Differentiable Models via Reinforcement Learning