Four self-stigma personas identified via LPA on 1,174 Reddit users; persona-conditioned LLMs achieve targeted shifts but experts prefer generic empathy baselines.
mega hub Mixed citations
Machine Learning 45(1), 5–32 (Oct 2001)
Mixed citation behavior. Most common role is background (55%).
hub tools
citation-role summary
citation-polarity summary
authors
mega hub controls
Recognition alignment
counterfactual ablation
co-cited works
representative citing papers
A calibration strategy using full-Jones corrections with an in-field unpolarised calibrator and visibility-based multi-epoch alignment enables sub-arcsecond polarimetric imaging with LOFAR at metre wavelengths.
TIDAL recovers temporal phase signals from LLM-derived semantics of provisioning metadata to enable complementary CVD placement, reducing overload frequency by 79.1% on production traces.
TabPFN-MT is a multitask in-context learner for tabular data that sets a new state-of-the-art on deep multitask learning for datasets under 1000 samples while reducing inference cost from O(T) to O(1) passes.
Presents the first public synthetic spectra database for novae and demonstrates a PCA/AI framework for retrieving physical properties from limited spectral data as a proof of concept for future surveys.
MulTaBench is a new collection of 40 image-tabular and text-tabular datasets designed to test target-aware representation tuning in multimodal tabular models.
LG-CoTrain, an LLM-guided co-training method, outperforms classical semi-supervised baselines for crisis tweet classification in low-resource settings with 5-25 labeled examples per class.
An intrinsic effective sample size for manifold MCMC is defined via kernel discrepancy as the number of independent draws yielding equivalent expected squared discrepancy to the target.
EstGraph benchmark evaluates LLMs on estimating properties of very large graphs from random-walk samples that fit in context limits.
The profile maximum likelihood estimator for the location in anisotropic hyperbolic wrapped normal models is strongly consistent, asymptotically normal, and attains the Hájek-Le Cam minimax lower bound under squared geodesic loss.
SynQL synthesizes diverse, execution-ready SQL workloads by deterministically traversing foreign-key graphs to populate ASTs, yielding high topological entropy and cost-model training data with R² ≥ 0.79 on held-out sets.
A 1825 storm created a new sea connection in Denmark, producing a 27 percent population increase (elasticity 1.6 to market access) driven by fertility and occupational change toward fishing and manufacturing, with symmetric medieval declines after waterway closure.
SimPhysNet achieves 96.06% accuracy classifying laser welding penetration states using self-supervised contrastive learning with a physics-informed neural network and prototypical networks on only 200 labeled images.
dille detects silent semantic faults in random forest ML pipelines with 91% precision via data-informed static analysis on Kaggle notebooks, finding 12-18% of scripts affected.
Machine learning on the largest curated alkali-activated slag dataset shows that average metal oxide dissociation energy serves as a compact, physically interpretable reactivity descriptor enabling strength prediction and low-emission design space exploration.
Develops a skew-adaptive split conformal prediction method that learns local skewness via a gauge-derived conformity score and an asinh residual model while preserving marginal validity under exchangeability.
Neural point-forms are introduced as permutation-invariant neural layers that output learned form-comparison matrices for point clouds, with a claimed consistency proof under sampling and manifold assumptions and competitive results on synthetic and biological data.
Develops Grenander-type and debiased machine learning estimators for the sublevel-set probability curve of the CATE function, shown to be non-pathwise differentiable, along with its piecewise linear approximation.
Semantic segmentation decomposes monitoring features into canonical and residual components that concentrate fault-predictive information while preserving operational meaning in predictive maintenance.
Machine translation preserves embedding similarity structure for ten languages but distorts it for four in the Manifesto Corpus, via a new non-inferiority testing framework.
Vesselpose predicts voxel-wise direction vectors to extend the TEASAR algorithm for topologically accurate vascular graph reconstruction from 3D images.
RCProb uses Dirichlet-smoothed class priors and Beta-smoothed condition likelihoods in a Naive Bayes formulation to extract rules from tree ensembles approximately 22 times faster than RuleCOSI+ while maintaining competitive accuracy and producing more compact rule sets on 33 benchmark datasets.
StarCLR pretrains on TESS light curves via contrastive learning on overlapping subsequences and improves variable star classification F1 scores over scratch-trained models when fine-tuned on TESS, ZTF, and Gaia.
Random forests on string similarity features outperform LLMs for German dialect lexicon induction and boost dialect information retrieval by up to 50% in recall.
citing papers explorer
-
Self-Stigma Is Not a Monolith, but Generic Empathy Is: Persona-Conditioned LLM Support for People Who Use Drugs
Four self-stigma personas identified via LPA on 1,174 Reddit users; persona-conditioned LLMs achieve targeted shifts but experts prefer generic empathy baselines.
-
Polarisation and Faraday rotation measure imaging at metre wavelengths with sub-arcsecond resolution: a foundational calibration strategy
A calibration strategy using full-Jones corrections with an in-field unpolarised calibrator and visibility-based multi-epoch alignment enables sub-arcsecond polarimetric imaging with LOFAR at metre wavelengths.
-
TIDAL: Recovering Temporal Phase for Cloud Block Storage Placement from LLM-Derived Semantics
TIDAL recovers temporal phase signals from LLM-derived semantics of provisioning metadata to enable complementary CVD placement, reducing overload frequency by 79.1% on production traces.
-
TabPFN-MT: A Natively Multitask In-Context Learner for Tabular Data
TabPFN-MT is a multitask in-context learner for tabular data that sets a new state-of-the-art on deep multitask learning for datasets under 1000 samples while reducing inference cost from O(T) to O(1) passes.
-
The Nova Synthetic Data Base: A Principal Component/AI Analysis of Novae Synoptic Spectra
Presents the first public synthetic spectra database for novae and demonstrates a PCA/AI framework for retrieving physical properties from limited spectral data as a proof of concept for future surveys.
-
MulTaBench: Benchmarking Multimodal Tabular Learning with Text and Image
MulTaBench is a new collection of 40 image-tabular and text-tabular datasets designed to test target-aware representation tuning in multimodal tabular models.
-
LLM-guided Semi-Supervised Approaches for Social Media Crisis Data Classification
LG-CoTrain, an LLM-guided co-training method, outperforms classical semi-supervised baselines for crisis tweet classification in low-resource settings with 5-25 labeled examples per class.
-
Intrinsic effective sample size for manifold-valued Markov chain Monte Carlo via kernel discrepancy
An intrinsic effective sample size for manifold MCMC is defined via kernel discrepancy as the number of independent draws yielding equivalent expected squared discrepancy to the target.
-
Evaluating LLMs on Large-Scale Graph Property Estimation via Random Walks
EstGraph benchmark evaluates LLMs on estimating properties of very large graphs from random-walk samples that fit in context limits.
-
Profile Likelihood Inference for Anisotropic Hyperbolic Wrapped Normal Models on Hyperbolic Space
The profile maximum likelihood estimator for the location in anisotropic hyperbolic wrapped normal models is strongly consistent, asymptotically normal, and attains the Hájek-Le Cam minimax lower bound under squared geodesic loss.
-
SynQL: A Controllable and Scalable Rule-Based Framework for SQL Workload Synthesis for Performance Benchmarking
SynQL synthesizes diverse, execution-ready SQL workloads by deterministically traversing foreign-key graphs to populate ASTs, yielding high topological entropy and cost-model training data with R² ≥ 0.79 on held-out sets.
-
A Perfect Storm: First-Nature Geography and Economic Development
A 1825 storm created a new sea connection in Denmark, producing a 27 percent population increase (elasticity 1.6 to market access) driven by fertility and occupational change toward fishing and manufacturing, with symmetric medieval declines after waterway closure.
-
A welding penetration prediction model for laser welding process based on self-supervised learning using physics-informed neural networks
SimPhysNet achieves 96.06% accuracy classifying laser welding penetration states using self-supervised contrastive learning with a physics-informed neural network and prototypical networks on only 200 labeled images.
-
Are We Lost in the Woods? Detecting Silent Semantic Faults for Random Forest Classifiers with Data-informed Static Analysis
dille detects silent semantic faults in random forest ML pipelines with 91% precision via data-informed static analysis on Kaggle notebooks, finding 12-18% of scripts affected.
-
Reactivity-Informed Machine Learning for Performance Prediction and Design Space Exploration of Alkali-Activated Slag
Machine learning on the largest curated alkali-activated slag dataset shows that average metal oxide dissociation energy serves as a compact, physically interpretable reactivity descriptor enabling strength prediction and low-emission design space exploration.
-
Skew-adaptive conformal prediction
Develops a skew-adaptive split conformal prediction method that learns local skewness via a gauge-derived conformity score and an asinh residual model while preserving marginal validity under exchangeability.
-
Neural Point-Forms
Neural point-forms are introduced as permutation-invariant neural layers that output learned form-comparison matrices for point clouds, with a claimed consistency proof under sampling and manifold assumptions and competitive results on synthetic and biological data.
-
Nonparametric inference for sublevel-set probabilities of conditional average treatment effect functions
Develops Grenander-type and debiased machine learning estimators for the sublevel-set probability curve of the CATE function, shown to be non-pathwise differentiable, along with its piecewise linear approximation.
-
Semantic Feature Segmentation for Interpretable Predictive Maintenance in Complex Systems
Semantic segmentation decomposes monitoring features into canonical and residual components that concentrate fault-predictive information while preserving operational meaning in predictive maintenance.
-
Is Textual Similarity Invariant under Machine Translation? Evidence Based on the Political Manifesto Corpus
Machine translation preserves embedding similarity structure for ten languages but distorts it for four in the Manifesto Corpus, via a new non-inferiority testing framework.
-
Vesselpose: Vessel Graph Reconstruction from Learned Voxel-wise Direction Vectors in 3D Vascular Images
Vesselpose predicts voxel-wise direction vectors to extend the TEASAR algorithm for topologically accurate vascular graph reconstruction from 3D images.
-
RCProb: Probabilistic Rule Extraction for Efficient Simplification of Tree Ensembles
RCProb uses Dirichlet-smoothed class priors and Beta-smoothed condition likelihoods in a Naive Bayes formulation to extract rules from tree ensembles approximately 22 times faster than RuleCOSI+ while maintaining competitive accuracy and producing more compact rule sets on 33 benchmark datasets.
-
StarCLR: Contrastive Learning Representation for Astronomical Light Curves
StarCLR pretrains on TESS light curves via contrastive learning on overlapping subsequences and improves variable star classification F1 scores over scratch-trained models when fine-tuned on TESS, ZTF, and Gaia.
-
Resource-Lean Lexicon Induction for German Dialects
Random forests on string similarity features outperform LLMs for German dialect lexicon induction and boost dialect information retrieval by up to 50% in recall.
-
FETS Benchmark: Foundation Models Outperform Dataset-specific Machine Learning in Energy Time Series Forecasting
Foundation models outperform dataset-specific machine learning in energy time series forecasting across 54 datasets in 9 categories.
-
ReSS: Learning Reasoning Models for Tabular Data Prediction via Symbolic Scaffold
ReSS extracts decision paths from trees as scaffolds to guide LLM reasoning generation, fine-tunes the LLM on the resulting dataset with scaffold-invariant augmentation, and reports up to 10% gains on medical and financial tabular benchmarks with new faithfulness metrics.
-
Identifying Changing-Look AGN Transitions in Light Curve Data with the Zwicky Transient Facility
A criterion of |Δg| > 0.4 mag and |Δ(g-r)| > 0.2 mag detects photometric CL-AGN transitions in 9.6% of known hosts with 1.6% false positive rate from simulations.
-
Detecting RAG Advertisements Across Advertising Styles
Entity recognition models detect ads in RAG responses effectively and stay robust when advertisers switch styles, while lightweight models like random forests and SVMs become brittle under the same changes.
-
Photometric Redshift PDFs via Neural Network Classification for DESI Legacy Imaging Surveys and Pan-STARRS
Neural network classification with CRPS optimization produces calibrated photometric redshift PDFs for DESI Legacy and Pan-STARRS data, achieving σ_NMAD of 0.0153 on LSDR10 and outperforming regression methods.
-
Rethinking player evaluation in sports: Goals above expectation and beyond
A double machine learning framework that residualizes standard outcome-above-expectation metrics to support valid frequentist inference and player-specific effect estimation in sports analytics.
-
Climate-Driven Mortality Forecasting Using Deep Learning
CNN-LSTM and GNN-LSTM models added to a Lee-Carter baseline reduce test MSE by about 24% versus MortFCNet on French regional mortality data from 1990-2019, with largest gains at oldest ages.
-
Cluster-Specific Localized Drift Detection for Efficient Batch Model Adaptation under Controlled Distribution Shift
A cluster-induced distribution shift simulation framework is proposed and used to evaluate six batch adaptation strategies including cluster-local ADWIN on five benchmark datasets.
-
SEAGAN: domain-Specific and Edge-Aware Graph Attention Network for Dynamic Plant Processes
SEAGAN applies a domain-specific graph attention network to classify limitation states in A-Ci curves, achieving F1-score 0.857 and accuracy 0.882 on synthetic data with known ground truth.
-
Data-driven modeling of Galactic diffuse emission with multi-wavelength observations
Supervised ML models achieve R^2 > 0.90 when mapping multi-frequency radio data to 0.1-10 GeV gamma-ray intensity and attribute high-frequency radio bands to hadronic processes and low-frequency bands to leptonic processes.
-
Adaptive Estimation of Aggregated Values of Conditional Linear Programs
The support function of the identified set for solutions to conditional linear programs is expressed as an average of intersections of regression functions and shown to be a regular parameter admitting standard asymptotic inference.
-
A Retrospective Benchmark of Spatiotemporal Covariates for Daily Active-Fire Detection in Cerrado Conservation Units
The paper establishes a reproducible retrospective benchmark for ranking daily active-fire detections in Cerrado conservation units by comparing atmospheric, surface, static spatial, and short-term memory covariates with standard ML models under time-series cross-validation and held-out AOI tests.
-
Correlation between baryonic process and galaxy assembly bias
Simulations show gas cooling and stellar feedback dominate assembly bias for stellar-mass selected galaxies while star formation gives way to gas cooling for SFR-selected galaxies as number density rises.
-
ldmppr: Location Dependent Marked Point Processes in R
ldmppr is an R package providing tools to model, simulate from, and assess goodness-of-fit for location-dependent marked point processes.
-
A tool to determine the degrees of freedom in tree-structured varying coefficient models
A formula approximating degrees of freedom for tree-structured varying coefficient models is proposed to improve BIC model selection over naive parameter counting.
-
Exploring the Transitional Parameter Space of Blazars using Gamma-ray and X-ray Population Diagnostics
Changing-look blazars occupy intermediate regions in gamma-ray and X-ray parameter spaces but lie statistically closer to flat-spectrum radio quasars than to BL Lac objects according to centroids, PCA, UMAP, and random-forest classification.
-
Mind the Gap? A Distributional Comparison of Real and Synthetic Priors for Tabular Foundation Models
The synthetic prior for tabular foundation models covers only a narrow part of real table distributions, but this mismatch does not degrade model generalization.
-
Data-Efficient Indentation Size Effect Correction in Steels Using Machine Learning and Physics-Guided Augmentation
Physics-guided data augmentation combined with neural networks enables accurate indentation size effect correction in steels from small sets of shallow nanoindentation measurements, outperforming Nix-Gao in the shallow regime.
-
Knowledge-Data Dually Driven Paradigm for Accurate Landslide Susceptibility Prediction under Data-Scarce Conditions Using Geomorphic Priors and Tabular Foundation Model
A knowledge-data dual paradigm using geomorphic priors and a tabular foundation model achieves baseline-level landslide susceptibility prediction accuracy with only 30% of typical data in tested regions.
-
Interpretable Quantile Regression by Optimal Decision Trees
A novel algorithm learns sets of optimal quantile regression trees to predict full conditional distributions interpretably and efficiently.
-
Is the `Known' Enough? An Integrated Machine Learning Framework for Eclipsing Binary Classification and Parameter Estimation Based on Well-Characterized Systems
An ensemble ML framework achieves 90.7% morphology classification accuracy and R² values of 0.77–0.92 for key parameters on held-out test data, with external validation against OGLE and Kepler catalogs.
-
The T16 Planet Hunt: 10,000 New Planet Candidates from TESS Cycle 1 and the Confirmation of a Hot Jupiter Around TIC 183374187
A transit search on TESS Cycle 1 full-frame images produced 10,091 new planet candidates down to T=16 mag, more than doubling the known TESS total, with one hot Jupiter confirmed by radial velocity.
-
Financial Dynamics and Interconnected Risk of Liquid Restaking
Renzo liquid restaking revenue is primarily predicted by EigenLayer value locked, token yield, and multi-blockchain expansion, with current bridge risks not imposing systemic threats to the restaking ecosystem.
-
From Time-series Generation, Model Selection to Transfer Learning: A Comparative Review of Pixel-wise Approaches for Large-scale Crop Mapping
A comparative review with experiments identifying optimal preprocessing, models, and transfer strategies for large-scale pixel-wise crop mapping using Landsat 8 data across five sites.
-
Model Internal Sleuthing: Finding Lexical Identity and Inflectional Features in Modern Language Models
Inflectional features stay linearly decodable across all layers while lexical identity weakens with depth in modern transformers.
-
Fuzzy Convolution Neural Networks for Tabular Data Classification
FCNN maps tabular features to fuzzy memberships, arranges them as images, and uses CNNs to classify, reporting competitive or superior results versus DT, SVM, FNN, Bayes, and RF on six generated noisy datasets.