pith. sign in

Mastering the game of go without human knowledge.Nature, 550(7676):354–359

4 Pith papers cite this work. Polarity classification is still indexing.

4 Pith papers citing it

citation-role summary

background 2 method 1

citation-polarity summary

years

2026 3 2025 1

clear filters

representative citing papers

Bounded Ratio Reinforcement Learning

cs.LG · 2026-04-20 · conditional · novelty 7.0

BRRL derives an analytic optimal policy for regularized constrained RL that guarantees monotonic improvement and yields the BPO algorithm that matches or exceeds PPO.

Group-in-Group Policy Optimization for LLM Agent Training

cs.LG · 2025-05-16 · unverdicted · novelty 7.0

GiGPO adds a hierarchical grouping mechanism to group-based RL so that LLM agents receive both global trajectory and local step-level credit signals, yielding >12% gains on ALFWorld and >9% on WebShop over GRPO while keeping the same rollout and memory footprint.

citing papers explorer

Showing 1 of 1 citing paper after filters.