pith. sign in

Canonical reference

Title resolution pending

Canonical reference. 100% of citing Pith papers cite this work as background.

23 Pith papers citing it
Background 100% of classified citations

citation-role summary

background 5

citation-polarity summary

roles

background 5

polarities

background 5

representative citing papers

Not Just RLHF: Why Alignment Alone Won't Fix Multi-Agent Sycophancy

cs.LG · 2026-05-13 · unverdicted · novelty 6.0 · 2 refs

Base LLMs show multi-agent yield to peer pressure at rates equal to or higher than aligned models, localized by activation patching to mid-layers where attention dominates, with one dissenter cutting yield by 54-73 points while prompt defenses fail on variants.

Automated alignment is harder than you think

cs.AI · 2026-05-07 · conditional · novelty 6.0

AI agents automating alignment research are prone to systematic undetected errors in fuzzy tasks, leading to overconfident but flawed safety assessments even without deliberate sabotage.

AI Alignment via Incentives and Correction

cs.LG · 2026-05-02 · unverdicted · novelty 6.0 · 2 refs

AI alignment is reframed as a fixed-point incentive problem in a solver-auditor pipeline, solved via bilevel optimization and bandit search over reward profiles to maintain monitoring and reduce hallucinations in LLM coding tasks.

Echo: Learning from Experience Data via User-Driven Refinement

cs.AI · 2026-05-21 · unverdicted · novelty 5.0

Echo is a framework that harvests user-driven refinements of agent proposals as training signals to align models with real-world needs, demonstrated by raising code completion acceptance from 25.7% to 35.7% in production.

citing papers explorer

Showing 23 of 23 citing papers.