arXiv preprint arXiv:2404.16019 (2024)

arXiv:2404 · 2024 · arXiv 2404.16019

10 Pith papers cite this work. Polarity classification is still indexing.

10 Pith papers citing it

read on arXiv browse 10 citing papers

citation-role summary

other 3 background 1

citation-polarity summary

unclear 3 background 1

representative citing papers

ContextEcho: A Benchmark for Persona Drift in Long Agentic-Coding Sessions

cs.CL · 2026-05-22 · unverdicted · novelty 7.0

ContextEcho benchmark shows persona drift occurs across 23 frontier models in long agentic-coding sessions, is not reliably reset by compaction, and can be restored by single-shot anchors with mode-dependent effects.

PERMA: Benchmarking Personalized Memory Agents via Event-Driven Preference and Realistic Task Environments

cs.AI · 2026-03-24 · unverdicted · novelty 7.0

PERMA is a new benchmark using temporally ordered events, text variability, and linguistic alignment to evaluate LLM memory agents on persona consistency beyond simple retrieval.

From tools to thieves: Measuring and understanding public perceptions of AI through crowdsourced metaphors

cs.CY · 2025-01-29 · unverdicted · novelty 7.0

Crowdsourced metaphors show rising anthropomorphism and warmth toward AI that predict trust and adoption, with notable demographic differences.

Retroactive Advantage Correction: Closed-Form V-Trace Bias Correction for Delay-Aware RLHF

cs.LG · 2026-06-25 · unverdicted · novelty 6.0

RAC is a closed-form bias correction for delayed rewards in RLHF that is unbiased under full mass reinjection of the delay kernel and reduces to V-trace with no delay.

PEBS: Per-rater Empirical-Bayes Shrinkage for RLHF Reward-Model Calibration

cs.LG · 2026-06-25 · unverdicted · novelty 6.0

PEBS applies Morris-James-Stein empirical-Bayes shrinkage to per-rater affine calibrators in RLHF, cutting within-user held-out RMSE by 8.58% on PRISM and 9.66% on PluriHarms versus pooled baselines.

Training-Free Cultural Alignment of Large Language Models via Persona Disagreement

cs.CL · 2026-05-11 · conditional · novelty 6.0 · 2 refs

DISCA converts within-country disagreement among World Values Survey personas into a bounded logit correction that reduces cultural misalignment by 10-24% on MultiTP for models 3.8B and larger across 20 countries, without any weight updates.

Hedging and Non-Affirmation: Quantifying LLM Alignment on Questions of Human Rights

cs.CY · 2025-02-26 · unverdicted · novelty 6.0

LLMs exhibit identity-dependent hedging on human rights questions, with group identity as the strongest predictor among tested factors, and group steering mitigates the disparity.

Relative Principals, Pluralistic Alignment, and the Structural Value Alignment Problem

cs.CY · 2026-04-22 · unverdicted · novelty 5.0

AI value alignment is reconceptualized as a pluralistic governance problem arising along three axes—objectives, information, and principals—making it inherently context-dependent and unsolvable by technical design alone.

Evaluating AI-Generated Images of Cultural Artifacts with Community-Informed Rubrics

cs.CY · 2026-04-02 · unverdicted · novelty 5.0 · 2 refs

Case studies with blind UK residents and people from Kerala and Tamil Nadu demonstrate that community input at the systematization stage produces culturally grounded definitions of appropriateness for text-to-image model outputs.

Reinforcement Learning from Human Feedback

cs.LG · 2025-04-16

citing papers explorer

Showing 10 of 10 citing papers.

ContextEcho: A Benchmark for Persona Drift in Long Agentic-Coding Sessions cs.CL · 2026-05-22 · unverdicted · none · ref 40
ContextEcho benchmark shows persona drift occurs across 23 frontier models in long agentic-coding sessions, is not reliably reset by compaction, and can be restored by single-shot anchors with mode-dependent effects.
PERMA: Benchmarking Personalized Memory Agents via Event-Driven Preference and Realistic Task Environments cs.AI · 2026-03-24 · unverdicted · none · ref 26
PERMA is a new benchmark using temporally ordered events, text variability, and linguistic alignment to evaluate LLM memory agents on persona consistency beyond simple retrieval.
From tools to thieves: Measuring and understanding public perceptions of AI through crowdsourced metaphors cs.CY · 2025-01-29 · unverdicted · none · ref 85
Crowdsourced metaphors show rising anthropomorphism and warmth toward AI that predict trust and adoption, with notable demographic differences.
Retroactive Advantage Correction: Closed-Form V-Trace Bias Correction for Delay-Aware RLHF cs.LG · 2026-06-25 · unverdicted · none · ref 10
RAC is a closed-form bias correction for delayed rewards in RLHF that is unbiased under full mass reinjection of the delay kernel and reduces to V-trace with no delay.
PEBS: Per-rater Empirical-Bayes Shrinkage for RLHF Reward-Model Calibration cs.LG · 2026-06-25 · unverdicted · none · ref 5
PEBS applies Morris-James-Stein empirical-Bayes shrinkage to per-rater affine calibrators in RLHF, cutting within-user held-out RMSE by 8.58% on PRISM and 9.66% on PluriHarms versus pooled baselines.
Training-Free Cultural Alignment of Large Language Models via Persona Disagreement cs.CL · 2026-05-11 · conditional · none · ref 21 · 2 links
DISCA converts within-country disagreement among World Values Survey personas into a bounded logit correction that reduces cultural misalignment by 10-24% on MultiTP for models 3.8B and larger across 20 countries, without any weight updates.
Hedging and Non-Affirmation: Quantifying LLM Alignment on Questions of Human Rights cs.CY · 2025-02-26 · unverdicted · none · ref 27
LLMs exhibit identity-dependent hedging on human rights questions, with group identity as the strongest predictor among tested factors, and group steering mitigates the disparity.
Relative Principals, Pluralistic Alignment, and the Structural Value Alignment Problem cs.CY · 2026-04-22 · unverdicted · none · ref 49
AI value alignment is reconceptualized as a pluralistic governance problem arising along three axes—objectives, information, and principals—making it inherently context-dependent and unsolvable by technical design alone.
Evaluating AI-Generated Images of Cultural Artifacts with Community-Informed Rubrics cs.CY · 2026-04-02 · unverdicted · none · ref 67 · 2 links
Case studies with blind UK residents and people from Kerala and Tamil Nadu demonstrate that community input at the systematization stage produces culturally grounded definitions of appropriateness for text-to-image model outputs.
Reinforcement Learning from Human Feedback cs.LG · 2025-04-16 · unreviewed · ref 219

arXiv preprint arXiv:2404.16019 (2024)

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer