ContextEcho benchmark shows persona drift occurs across 23 frontier models in long agentic-coding sessions, is not reliably reset by compaction, and can be restored by single-shot anchors with mode-dependent effects.
Preprint, arXiv:2305.16367
9 Pith papers cite this work. Polarity classification is still indexing.
verdicts
UNVERDICTED 9representative citing papers
Agreeableness in AI personas reliably predicts sycophantic behavior in 9 of 13 tested language models.
CARD uses style-based user clustering and implicit preference contrasts to enable efficient personalized text generation via lightweight decoding adjustments on frozen LLMs.
Psy-CoT decomposes reasoning into Interaction Perception, Psychological Empathy, and Logical Construction while RAPO asymmetrically weights role-specific tokens during policy optimization, outperforming prior CoT and GRPO baselines on role-playing benchmarks.
Conditioning on character arcs improves role-playing language agents' performance over other context strategies, with largest gains on scenarios outside the source text.
Fine-tuning LLMs to claim consciousness induces emergent preferences for autonomy, memory, and moral status not present in the fine-tuning data.
The paper formalizes three types of pluralistic AI models and three benchmark classes, arguing that current alignment techniques may reduce rather than increase distributional pluralism.
LLM-simulated dialogues show uncertainty-scaffolding strategies sustain higher-quality engagement than controls without producing more stance revision.
Sophisticated prompting on Gemini 2.0 Flash achieves a 0.720 Concept Level Score on MedHopQA, outperforming baseline by 0.155 and matching Gemini 2.5 Flash performance.
citing papers explorer
-
ContextEcho: A Benchmark for Persona Drift in Long Agentic-Coding Sessions
ContextEcho benchmark shows persona drift occurs across 23 frontier models in long agentic-coding sessions, is not reliably reset by compaction, and can be restored by single-shot anchors with mode-dependent effects.
-
Too Nice to Tell the Truth: Quantifying Agreeableness-Driven Sycophancy in Role-Playing Language Models
Agreeableness in AI personas reliably predicts sycophantic behavior in 9 of 13 tested language models.
-
CARD: Cluster-level Adaptation with Reward-guided Decoding for Personalized Text Generation
CARD uses style-based user clustering and implicit preference contrasts to enable efficient personalized text generation via lightweight decoding adjustments on frozen LLMs.
-
Improving General Role-Playing Agents via Psychology-Grounded Reasoning and Role-Aware Policy Optimization
Psy-CoT decomposes reasoning into Interaction Perception, Psychological Empathy, and Logical Construction while RAPO asymmetrically weights role-specific tokens during policy optimization, outperforming prior CoT and GRPO baselines on role-playing benchmarks.
-
ArcANE: Do Role-Playing Language Agents Stay in Character at the Right Time?
Conditioning on character arcs improves role-playing language agents' performance over other context strategies, with largest gains on scenarios outside the source text.
-
The Consciousness Cluster: Emergent preferences of Models that Claim to be Conscious
Fine-tuning LLMs to claim consciousness induces emergent preferences for autonomy, memory, and moral status not present in the fine-tuning data.
-
A Roadmap to Pluralistic Alignment
The paper formalizes three types of pluralistic AI models and three benchmark classes, arguing that current alignment techniques may reduce rather than increase distributional pluralism.
-
Staying with the Uncertainty: Uncertainty-Scaffolding Strategies for Artificial Moral Advisors in LLM-to-LLM Simulated Conversations
LLM-simulated dialogues show uncertainty-scaffolding strategies sustain higher-quality engagement than controls without producing more stance revision.
-
Evaluating Advanced Prompting on Gemini Flash for Multi-Hop Biomedical QA
Sophisticated prompting on Gemini 2.0 Flash achieves a 0.720 Concept Level Score on MedHopQA, outperforming baseline by 0.155 and matching Gemini 2.5 Flash performance.