Uncertainty- aware reward model: Teaching reward models to know what is unknown

Xingzhou Lou, Dong Yan, Wei Shen, Yuzi Yan, Jian Xie, Junge Zhang · 2024 · arXiv 2410.00847

6 Pith papers cite this work. Polarity classification is still indexing.

6 Pith papers citing it

read on arXiv browse 6 citing papers

citation-role summary

background 2

citation-polarity summary

background 2

representative citing papers

Variance-aware Reward Modeling with Anchor Guidance

stat.ML · 2026-05-12 · unverdicted · novelty 7.0

Anchor-guided variance-aware reward modeling uses two response-level anchors to resolve non-identifiability in Gaussian models of pluralistic preferences, yielding provable identification, a joint training objective, and improved RLHF performance.

Test-Time Personalization: A Diagnostic Framework and Probabilistic Fix for Scaling Failures

cs.LG · 2026-05-09 · unverdicted · novelty 7.0

Test-time scaling for personalized LLMs follows a logarithmic utility curve under oracle selection but standard reward models suffer user-level collapse and query-level hacking; a probabilistic reward model with learned variance enables consistent scaling.

Uncertainty Propagation in LLM-Based Systems

cs.SE · 2026-04-26 · unverdicted · novelty 7.0

This paper introduces a systems-level conceptual framing and a three-level taxonomy (intra-model, system-level, socio-technical) for uncertainty propagation in compound LLM applications, along with engineering insights and open challenges.

DynaCF: Mitigating Shortcut Learning in Reward Models via Dynamic Counterfactual Sensitivity

cs.LG · 2026-06-08 · unverdicted · novelty 5.0

DynaCF dynamically downweights shortcut-sensitive samples in reward model training by tracking margin shifts under online counterfactual perturbations within the Bradley-Terry loss.

Skywork-Reward: Bag of Tricks for Reward Modeling in LLMs

cs.AI · 2024-10-24 · unverdicted · novelty 4.0

Data-centric filtering yields an 80K preference dataset and reward models that lead RewardBench while boosting other top entries.

World Models: A Comprehensive Survey of Architectures, Methodologies, Reasoning Paradigms, and Applications

cs.LG · 2026-05-28 · unverdicted · novelty 3.0

The paper delivers a multi-axis taxonomy for world models that maps architectures, training families, reasoning strategies, and domains from early cognitive foundations through systems such as Dreamer, MuZero, and Sora while noting evaluation gaps.

citing papers explorer

Showing 2 of 2 citing papers after filters.

Test-Time Personalization: A Diagnostic Framework and Probabilistic Fix for Scaling Failures cs.LG · 2026-05-09 · unverdicted · none · ref 20
Test-time scaling for personalized LLMs follows a logarithmic utility curve under oracle selection but standard reward models suffer user-level collapse and query-level hacking; a probabilistic reward model with learned variance enables consistent scaling.
Uncertainty Propagation in LLM-Based Systems cs.SE · 2026-04-26 · unverdicted · none · ref 54
This paper introduces a systems-level conceptual framing and a three-level taxonomy (intra-model, system-level, socio-technical) for uncertainty propagation in compound LLM applications, along with engineering insights and open challenges.

Uncertainty- aware reward model: Teaching reward models to know what is unknown

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer