arXiv preprint arXiv:2508.02833 , year=

Lei Pang, Ruinan Jin · 2025 · arXiv 2508.02833

4 Pith papers cite this work. Polarity classification is still indexing.

4 Pith papers citing it

representative citing papers

Faster Synchronous On-Policy RL via Straggler-Aware Group Sizing

cs.LG · 2026-06-01 · unverdicted · novelty 7.0

SAGC dynamically adjusts group sizes in synchronous GRPO and DAPO via online constrained optimization to cut stragglers, improve wall-clock speed, and maintain or improve rewards and downstream reasoning performance.

FGRPO: Federated GRPO with Adaptive Aggregation on Non-IID Data

cs.LG · 2026-06-02 · unverdicted · novelty 6.0

FGRPO decentralizes GRPO fine-tuning via adaptive aggregation based on relative performance gain to achieve robust convergence on non-IID data while preserving privacy.

LC-ERD: Mining Latent Logic for Self-Evolving Reasoning via Consistency-Regulated Reward Decomposition

cs.AI · 2026-05-19 · unverdicted · novelty 6.0

LC-ERD frames LLM self-alignment as latent structure mining via a Variational Logic Potential and Multi-Agent Value Decomposition to provide granular, logic-consistent supervision.

Navigating the Clutter: Waypoint-Based Bi-Level Planning for Multi-Robot Systems

cs.RO · 2026-04-22 · unverdicted · novelty 6.0

Waypoint-based bi-level planning with curriculum RLVR improves multi-robot task success rates in dense-obstacle benchmarks over motion-agnostic and VLA baselines.

citing papers explorer

Showing 4 of 4 citing papers.

Faster Synchronous On-Policy RL via Straggler-Aware Group Sizing cs.LG · 2026-06-01 · unverdicted · none · ref 33
SAGC dynamically adjusts group sizes in synchronous GRPO and DAPO via online constrained optimization to cut stragglers, improve wall-clock speed, and maintain or improve rewards and downstream reasoning performance.
FGRPO: Federated GRPO with Adaptive Aggregation on Non-IID Data cs.LG · 2026-06-02 · unverdicted · none · ref 24
FGRPO decentralizes GRPO fine-tuning via adaptive aggregation based on relative performance gain to achieve robust convergence on non-IID data while preserving privacy.
LC-ERD: Mining Latent Logic for Self-Evolving Reasoning via Consistency-Regulated Reward Decomposition cs.AI · 2026-05-19 · unverdicted · none · ref 28
LC-ERD frames LLM self-alignment as latent structure mining via a Variational Logic Potential and Multi-Agent Value Decomposition to provide granular, logic-consistent supervision.
Navigating the Clutter: Waypoint-Based Bi-Level Planning for Multi-Robot Systems cs.RO · 2026-04-22 · unverdicted · none · ref 115
Waypoint-based bi-level planning with curriculum RLVR improves multi-robot task success rates in dense-obstacle benchmarks over motion-agnostic and VLA baselines.

arXiv preprint arXiv:2508.02833 , year=

fields

years

verdicts

representative citing papers

citing papers explorer