A survey that maps safety risks in personalized LLMs, introduces a unified taxonomy, and highlights three structural inadequacies in existing research on user-invariant safety, isolated techniques, and short-term evaluations.
arXiv preprint arXiv:2503.06358 (2025)
2 Pith papers cite this work. Polarity classification is still indexing.
fields
cs.AI 2years
2026 2verdicts
UNVERDICTED 2representative citing papers
PAFO applies Pareto fairness optimization and group-specialized distillation to produce a single personalized reward model that improves accuracy for both majority and minority preference groups without requiring group labels at inference.
citing papers explorer
-
Personalization Meets Safety:Mechanisms,Risks,and Mitigations in Personalized LLMs
A survey that maps safety risks in personalized LLMs, introduces a unified taxonomy, and highlights three structural inadequacies in existing research on user-invariant safety, isolated techniques, and short-term evaluations.
-
PAFO: Pareto Fairness Optimization for Personalized Reward Modeling
PAFO applies Pareto fairness optimization and group-specialized distillation to produce a single personalized reward model that improves accuracy for both majority and minority preference groups without requiring group labels at inference.