Title resolution pending

Xuechen Li, Tianyi Zhang, Yann Dubois, Rohan Taori, Ishaan Gulrajani, Carlos Guestrin, Percy Liang, Tatsunori B · arXiv 2305.18438

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it

read on arXiv browse 2 citing papers

Title metadata for this work has not finished resolving. The hub is built from the citation graph; the title resolver retries DOI and OpenAlex on its next pass.

representative citing papers

Corruption-robust Offline Multi-agent Reinforcement Learning From Human Feedback

cs.LG · 2026-03-30 · unverdicted · novelty 7.0

Introduces robust estimators for linear Markov games in offline MARLHF that achieve O(ε^{1-o(1)}) or O(√ε) bounds on Nash or CCE gaps under uniform or unilateral coverage.

Provably avoiding over-optimization in Direct Preference Optimization without knowing the data distribution

cs.LG · 2026-02-05 · unverdicted · novelty 5.0 · 2 refs

PEPO is a single-step pessimistic ensemble algorithm for direct preference optimization that provably avoids over-optimization by depending only on single-policy concentrability without knowing the data distribution or learning an explicit reward model.

citing papers explorer

Showing 2 of 2 citing papers.

Corruption-robust Offline Multi-agent Reinforcement Learning From Human Feedback cs.LG · 2026-03-30 · unverdicted · none · ref 7
Introduces robust estimators for linear Markov games in offline MARLHF that achieve O(ε^{1-o(1)}) or O(√ε) bounds on Nash or CCE gaps under uniform or unilateral coverage.
Provably avoiding over-optimization in Direct Preference Optimization without knowing the data distribution cs.LG · 2026-02-05 · unverdicted · none · ref 18 · 2 links
PEPO is a single-step pessimistic ensemble algorithm for direct preference optimization that provably avoids over-optimization by depending only on single-policy concentrability without knowing the data distribution or learning an explicit reward model.

Title resolution pending

fields

years

verdicts

representative citing papers

citing papers explorer