pith. sign in

Dcpo: Dynamic clipping policy optimization

10 Pith papers cite this work. Polarity classification is still indexing.

10 Pith papers citing it

citation-role summary

background 2 other 1

citation-polarity summary

years

2026 9 2025 1

polarities

background 2 unclear 1

clear filters

representative citing papers

Revisiting DAgger in the Era of LLM-Agents

cs.LG · 2026-05-13 · conditional · novelty 6.0

DAgger-style training with turn-level policy interpolation raises 4B and 8B LLM agents to 27.3% and 29.8% on SWE-bench Verified, beating several larger published systems.

SSPO: Subsentence-level Policy Optimization

cs.CL · 2025-11-06 · unverdicted · novelty 6.0

SSPO computes policy importance ratios at the subsentence level with entropy-adjusted clipping bounds, yielding higher average scores than GRPO and GSPO on math reasoning benchmarks with Qwen models.

Baichuan-M4: A Clinical-Grade Medical Agent System for Continuous Care

cs.AI · 2026-06-08 · unverdicted · novelty 3.0

The paper describes Baichuan-M4, a coordinated medical agent system that reports leading scores across static knowledge, dynamic consultation, long-context memory, retrieval, OCR, and multimodal tasks with a 3.3% hallucination rate.

citing papers explorer

Showing 1 of 1 citing paper after filters.