Echo is a framework that harvests user-driven refinements of agent proposals as training signals to align models with real-world needs, demonstrated by raising code completion acceptance from 25.7% to 35.7% in production.
Reinforcement learning from user feedback
3 Pith papers cite this work. Polarity classification is still indexing.
3
Pith papers citing it
years
2026 3representative citing papers
ABPO combines group-relative policy optimization with anchored exposure correction and asymmetric feedback handling to enable effective continual updates for LLM recommenders under bandit feedback constraints.
citing papers explorer
-
Echo: Learning from Experience Data via User-Driven Refinement
Echo is a framework that harvests user-driven refinements of agent proposals as training signals to align models with real-world needs, demonstrated by raising code completion acceptance from 25.7% to 35.7% in production.
-
Don't Let Bandit Feedback Pull Continual LLM-Recommender Updates Off Target
ABPO combines group-relative policy optimization with anchored exposure correction and asymmetric feedback handling to enable effective continual updates for LLM recommenders under bandit feedback constraints.
- Improve Large Language Model Systems with User Logs