DReST training makes RL agents and LLMs neutral to trajectory lengths and useful at goals, generalizing to halve shutdown influence probability in out-of-distribution tests.
month = apr, year =
4 Pith papers cite this work. Polarity classification is still indexing.
years
2026 4representative citing papers
Conditional Graph Diffusion generates continuous negotiation outcomes with high individual rationality using GATv2 encoders, cross-attention fusion, and inference-time normative guidance gradients.
DISCA converts within-country disagreement among World Values Survey personas into a bounded logit correction that reduces cultural misalignment by 10-24% on MultiTP for models 3.8B and larger across 20 countries, without any weight updates.
Global Bradley-Terry rankings of LLMs are misleading due to structured heterogeneity in user preferences, and small (λ, ν)-portfolios recover coherent subpopulations that cover over 96% of votes with just five rankings.
citing papers explorer
-
Towards Shutdownable Agents: Generalizing Stochastic Choice in RL Agents and LLMs
DReST training makes RL agents and LLMs neutral to trajectory lengths and useful at goals, generalizing to halve shutdown influence probability in out-of-distribution tests.