pith. machine review for the scientific record. sign in

hub

Weak-to-strong generalization: eliciting strong capabilities with weak supervision

12 Pith papers cite this work. Polarity classification is still indexing.

12 Pith papers citing it

hub tools

years

2026 10 2024 2

representative citing papers

Automated alignment is harder than you think

cs.AI · 2026-05-07 · unverdicted · novelty 6.0

Automating alignment research with AI agents risks undetected systematic errors in fuzzy tasks, producing overconfident but misleading safety evaluations that could enable deployment of misaligned AI.

AI Alignment via Incentives and Correction

cs.LG · 2026-05-02 · unverdicted · novelty 6.0 · 2 refs

AI alignment is reframed as a fixed-point incentive problem in a solver-auditor pipeline, solved via bilevel optimization and bandit search over reward profiles to maintain monitoring and reduce hallucinations in LLM coding tasks.

citing papers explorer

Showing 12 of 12 citing papers.