Distribution-Aware Reward optimizes LLM regression by treating rollouts as empirical predictive distributions and rewarding marginal improvements in CRPS quality rather than point accuracy alone.
hub
international conference on machine learning , pages=
20 Pith papers cite this work. Polarity classification is still indexing.
hub tools
citation-role summary
citation-polarity summary
years
2026 20roles
background 1polarities
unclear 1representative citing papers
DCO decouples tuning for efficiency from calibration for coverage in conformal prediction, maintaining marginal guarantees and reducing average set sizes on benchmarks like ImageNet-A and Diabetes.
Lang2MLIP is an LLM multi-agent framework that automates end-to-end development of machine learning interatomic potentials from natural language input for heterogeneous materials systems.
Probability-of-Hit acquisition function ranks perturbation candidates by posterior probability of threshold exceedance, with asymptotic optimality proof and up to 6.4% gains on real immunology data.
SeqRejectron constructs a stopping rule with a small set of validator policies to achieve horizon-free sample complexity for selective imitation learning under arbitrary dynamics shifts.
DOSER detects OOD actions via diffusion-model denoising error and applies selective regularization based on predicted transitions, proving gamma-contraction with performance bounds and outperforming priors on offline RL benchmarks.
PODiff performs conditional diffusion in a fixed, variance-ordered POD latent space to enable efficient probabilistic super-resolution of high-dimensional scientific fields with lower memory and better-calibrated uncertainty than pixel-space or dropout baselines.
RisCoSet applies multiple hypothesis testing to construct risk-controlling partial-program prediction sets for LLM code generation, achieving up to 24.5% less code removal than prior methods at equivalent risk levels.
ReSIDe generalizes logit-based confidence scores to intermediate layers of synthetic image detectors and uses preference optimization to aggregate them, cutting area under the risk-coverage curve by up to 69.55% under covariate shifts.
Decoupled PFNs use controllable synthetic priors to train separate latent-signal and noise heads, making epistemic-aleatoric decomposition identifiable and improving acquisition in noisy settings.
Perturb-and-Correct generates epistemically diverse predictors from a single pretrained network via hidden-layer perturbations followed by affine least-squares corrections that enforce agreement on calibration data.
CHASE improves selective prediction under ambiguity by optimizing a ranking-aware selector over margins between competing temporal hypotheses, yielding up to 11% better alignment and 8.8% higher three-way accuracy than baselines on GUV-inspired tasks.
RePrompT uses recurrent prompt tuning to inject prior-visit latent states and cohort-derived population prompt tokens into LLMs, yielding better performance than pure EHR or pure LLM baselines on MIMIC clinical prediction tasks.
Avoiding CenterLoss improves OOD detection via multi-scale Mahalanobis on L2-normalized features, yielding 0.9483 AUROC on CIFAR-10 while preserving competitive in-distribution accuracy.
A framework uses stance detection, linear dimensionality reduction, and neural potential landscapes to recover a 3D stance space explaining 45% variance and to visualize large-scale shifts across platforms and years.
ERPPO adds a DSA-based ambiguity estimator to MAPPO and switches between L1 and L2 entropy regularization to improve exploration and stability in non-stationary multi-dimensional observations.
RADS applies reinforcement learning to pick informative samples for transfer learning, improving performance over uncertainty and diversity sampling in low-resource imbalanced clinical settings.
citing papers explorer
-
RePrompT: Recurrent Prompt Tuning for Integrating Structured EHR Encoders with Large Language Models
RePrompT uses recurrent prompt tuning to inject prior-visit latent states and cohort-derived population prompt tokens into LLMs, yielding better performance than pure EHR or pure LLM baselines on MIMIC clinical prediction tasks.
-
RADS: Reinforcement Learning-Based Sample Selection Improves Transfer Learning in Low-resource and Imbalanced Clinical Settings
RADS applies reinforcement learning to pick informative samples for transfer learning, improving performance over uncertainty and diversity sampling in low-resource imbalanced clinical settings.
- Diagnosing Multi-step Reasoning Failures in Black-box LLMs via Stepwise Confidence Attribution