pith. sign in

Canonical reference

arXiv preprint arXiv:2312.09390 , year =

Canonical reference. 100% of citing Pith papers cite this work as background.

29 Pith papers citing it
Background 100% of classified citations

citation-role summary

background 5

citation-polarity summary

roles

background 5

polarities

background 5

clear filters

representative citing papers

Certified Speculative Execution for Untrusted AI Agents

cs.CR · 2026-06-30 · unverdicted · novelty 7.0

CGPA enables certified speculative execution of untrusted AI proposals in constrained sequential decisions via verifier rejection, conformal boundary gating, and solver deferral, yielding zero violations and regret within noise of the oracle.

Tandem Reinforcement Learning with Verifiable Rewards

cs.AI · 2026-06-26 · unverdicted · novelty 7.0

TRL extends tandem training to RLVR pipelines, matching GRPO solo reasoning on Qwen3-4B math tasks while improving handoff robustness, reducing distributional drift, and increasing CoT legibility for the junior.

Weak-to-Strong Elicitation via Mismatched Wrong Drafts

cs.CL · 2026-05-17 · unverdicted · novelty 7.0 · 2 refs

Mismatched wrong drafts from Qwen2.5-Math-1.5B improve Mathstral-7B GRPO training, reaching 71.98% greedy pass@1 on MATH-500 and lifting AIME 2025/2026 pass@k over baselines and other draft variants.

Not Just RLHF: Why Alignment Alone Won't Fix Multi-Agent Sycophancy

cs.LG · 2026-05-13 · unverdicted · novelty 6.0 · 2 refs

Base LLMs show multi-agent yield to peer pressure at rates equal to or higher than aligned models, localized by activation patching to mid-layers where attention dominates, with one dissenter cutting yield by 54-73 points while prompt defenses fail on variants.

Automated alignment is harder than you think

cs.AI · 2026-05-07 · conditional · novelty 6.0

AI agents automating alignment research are prone to systematic undetected errors in fuzzy tasks, leading to overconfident but flawed safety assessments even without deliberate sabotage.

AI Alignment via Incentives and Correction

cs.LG · 2026-05-02 · unverdicted · novelty 6.0 · 2 refs

AI alignment is reframed as a fixed-point incentive problem in a solver-auditor pipeline, solved via bilevel optimization and bandit search over reward profiles to maintain monitoring and reduce hallucinations in LLM coding tasks.

Trust Region On-Policy Distillation

cs.LG · 2026-05-31 · unverdicted · novelty 5.0

TrOPD stabilizes on-policy distillation for LLMs with trust-region learning, outlier estimation, and off-policy guidance, outperforming prior OPD methods on reasoning and code benchmarks.

Echo: Learning from Experience Data via User-Driven Refinement

cs.AI · 2026-05-21 · unverdicted · novelty 5.0

Echo is a framework that harvests user-driven refinements of agent proposals as training signals to align models with real-world needs, demonstrated by raising code completion acceptance from 25.7% to 35.7% in production.

citing papers explorer

Showing 8 of 8 citing papers after filters.