VitaBench 2.0 introduces a benchmark for long-term personalized and proactive agent behavior, with results indicating substantial gaps in current frontier LLMs.
Teach LLMs to personalize–an approach inspired by writing education.arXiv preprint arXiv:2308.07968, 2023
3 Pith papers cite this work. Polarity classification is still indexing.
3
Pith papers citing it
verdicts
UNVERDICTED 3representative citing papers
VAC replaces scalar rewards with natural language feedback in an alternating training loop between a feedback model and a policy model, yielding better personalized QA on the LaMP-QA benchmark.
TAP-PER encodes user preferences as lightweight learnable prefix embeddings that outperform prompt-based and adapter-based baselines on LaMP tasks with 130x fewer per-user parameters.
citing papers explorer
-
Learning from Natural Language Feedback for Personalized Question Answering
VAC replaces scalar rewards with natural language feedback in an alternating training loop between a feedback model and a policy model, yielding better personalized QA on the LaMP-QA benchmark.