Reinforcement learning from user feedback

16 E · 2025 · arXiv 2505.14946

3 Pith papers cite this work. Polarity classification is still indexing.

3 Pith papers citing it

representative citing papers

Echo: Learning from Experience Data via User-Driven Refinement

cs.AI · 2026-05-21 · unverdicted · novelty 5.0

Echo is a framework that harvests user-driven refinements of agent proposals as training signals to align models with real-world needs, demonstrated by raising code completion acceptance from 25.7% to 35.7% in production.

Don't Let Bandit Feedback Pull Continual LLM-Recommender Updates Off Target

cs.LG · 2026-05-17 · unverdicted · novelty 5.0

ABPO combines group-relative policy optimization with anchored exposure correction and asymmetric feedback handling to enable effective continual updates for LLM recommenders under bandit feedback constraints.

Improve Large Language Model Systems with User Logs

cs.CL · 2026-02-06

citing papers explorer

Showing 3 of 3 citing papers.

Echo: Learning from Experience Data via User-Driven Refinement cs.AI · 2026-05-21 · unverdicted · none · ref 12
Echo is a framework that harvests user-driven refinements of agent proposals as training signals to align models with real-world needs, demonstrated by raising code completion acceptance from 25.7% to 35.7% in production.
Don't Let Bandit Feedback Pull Continual LLM-Recommender Updates Off Target cs.LG · 2026-05-17 · unverdicted · none · ref 6
ABPO combines group-relative policy optimization with anchored exposure correction and asymmetric feedback handling to enable effective continual updates for LLM recommenders under bandit feedback constraints.
Improve Large Language Model Systems with User Logs cs.CL · 2026-02-06 · unreviewed · ref 11

Reinforcement learning from user feedback

fields

years

verdicts

representative citing papers

citing papers explorer