hub Canonical reference

Laminar: A scalable asyn- chronous rl post-training framework

· 2025 · arXiv 2510.12633

Canonical reference. 100% of citing Pith papers cite this work as background.

15 Pith papers citing it

Background 100% of classified citations

read on arXiv browse 15 citing papers

hub tools

JSON dossier citing papers JSON arXiv source

citation-role summary

background 5

citation-polarity summary

background 5

representative citing papers

Harnessing Routing Foresight for Micro-step-level MoE load balancing in RL Post-training

cs.DC · 2026-06-10 · unverdicted · novelty 7.0

ForeMoE uses routing foresight from the rollout stage to enable micro-step load balancing in MoE RL post-training via a hierarchical planner and transfer engine, claiming up to 1.45x speedup on 64 GPUs.

RL-VLA$^3$: A Flexible and Asynchronous Reinforcement Learning Framework for VLA Training

cs.AI · 2026-02-05 · unverdicted · novelty 7.0

RL-VLA³ is an asynchronous RL framework for VLA training that delivers up to 85.2% higher throughput than synchronous baselines while preserving identical sample efficiency and scaling to 256 GPUs.

Retroactive Advantage Correction: Closed-Form V-Trace Bias Correction for Delay-Aware RLHF

cs.LG · 2026-06-25 · unverdicted · novelty 6.0

RAC is a closed-form bias correction for delayed rewards in RLHF that is unbiased under full mass reinjection of the delay kernel and reduces to V-trace with no delay.

AsyncOPD: How Stale Can On-Policy Distillation Be?

cs.LG · 2026-06-23 · conditional · novelty 6.0

AsyncOPD shows asynchronous OPD training reaches 1.6-3.8x higher throughput than synchronous baselines with comparable accuracy by using forward-KL estimators and multi-sample Monte Carlo correction for finite teacher caches.

Spotlight: Synergizing Seed Exploration and Spot GPUs for DiT RL Post-Training

cs.DC · 2026-06-17 · unverdicted · novelty 6.0

Spotlight achieves 4x faster DiT RL post-training on spot GPUs via stale-weight exploration and elastic sequence parallelism, cutting costs 1.4-6.4x with better image quality.

AsyncWebRL: Efficient Multi-Step RL for Visual Web Agents

cs.LG · 2026-06-04 · unverdicted · novelty 6.0

AsyncWebRL reports up to 2.9x training speedup and new SOTA on WebGym OOD split via async overlap plus constant normalizer in GRPO, with largest gains on harder tasks.

Diagnosing Training Inference Mismatch in LLM Reinforcement Learning

cs.LG · 2026-05-14 · unverdicted · novelty 6.0

Training-inference mismatch in separated rollout and optimization stages of LLM RL can independently cause training collapse.

Missing Old Logits in Asynchronous Agentic RL: Semantic Mismatch and Repair Methods for Off-Policy Correction

cs.LG · 2026-05-12 · unverdicted · novelty 6.0

Missing old logits in async agentic RL entangle discrepancy and staleness terms in PPO off-policy correction; exact acquisition methods and revised PPO-EWMA restore decoupled updates with reported gains in speed and performance.

FlashEvolve: Accelerating Agent Self-Evolution with Asynchronous Stage Orchestration

cs.LG · 2026-05-08 · unverdicted · novelty 6.0

FlashEvolve accelerates LLM agent self-evolution via asynchronous stage orchestration and inspectable language-space staleness handling, reporting 3.5-4.9x proposal throughput gains over synchronous baselines on GEPA workloads.

ROSE: Rollout On Serving GPUs via Cooperative Elasticity for Agentic RL

cs.DC · 2026-05-07 · unverdicted · novelty 6.0 · 2 refs

ROSE is a system for cooperative elasticity that co-locates serving and rollout models on shared GPUs, delivering 1.3-3.3x higher end-to-end throughput than fixed-resource baselines while preserving serving SLOs.

DORA: A Scalable Asynchronous Reinforcement Learning System for Language Model Training

cs.LG · 2026-04-29 · unverdicted · novelty 6.0

DORA's multi-version streaming rollout enables 2-3x higher throughput in asynchronous RL for LLMs while preserving convergence by maintaining policy consistency, data integrity, and bounded staleness.

JigsawRL: Assembling RL Pipelines for Efficient LLM Post-Training

cs.LG · 2026-04-26 · unverdicted · novelty 6.0

JigsawRL achieves up to 1.85x higher throughput in LLM RL pipelines via pipeline multiplexing, sub-stage graphs, and look-ahead scheduling compared to prior systems.

TensorHub: Scalable and Elastic Weight Transfer for LLM RL Training

cs.DC · 2026-04-10 · unverdicted · novelty 6.0

TensorHub uses Reference-Oriented Storage to enable scalable weight transfer in LLM RL training by referencing replicated GPU weights, achieving up to 19x reduction in cross-datacenter stall time.

Seer: Online Context Learning for Fast Synchronous LLM Reinforcement Learning

cs.DC · 2025-11-18 · unverdicted · novelty 6.0

Seer improves synchronous LLM RL rollout throughput by up to 2.04x and reduces long-tail latency by 72-94% via divided rollout, context-aware scheduling, and adaptive grouped speculative decoding based on prompt similarity observations.

Libra: Efficient Resource Management for Agentic RL Post-Training

cs.LG · 2026-06-02 · unverdicted · novelty 4.0

Libra optimizes GPU allocation across rollout and training in agentic RL via an elastic hybrid pool and C-MLFQ scheduler based on tool-return causal signals, claiming up to 3.0x throughput and 2.5x faster reward convergence on 48 A800 GPUs.

citing papers explorer

Showing 15 of 15 citing papers.

Harnessing Routing Foresight for Micro-step-level MoE load balancing in RL Post-training cs.DC · 2026-06-10 · unverdicted · none · ref 51
ForeMoE uses routing foresight from the rollout stage to enable micro-step load balancing in MoE RL post-training via a hierarchical planner and transfer engine, claiming up to 1.45x speedup on 64 GPUs.
RL-VLA$^3$: A Flexible and Asynchronous Reinforcement Learning Framework for VLA Training cs.AI · 2026-02-05 · unverdicted · none · ref 18
RL-VLA³ is an asynchronous RL framework for VLA training that delivers up to 85.2% higher throughput than synchronous baselines while preserving identical sample efficiency and scaling to 256 GPUs.
Retroactive Advantage Correction: Closed-Form V-Trace Bias Correction for Delay-Aware RLHF cs.LG · 2026-06-25 · unverdicted · none · ref 19
RAC is a closed-form bias correction for delayed rewards in RLHF that is unbiased under full mass reinjection of the delay kernel and reduces to V-trace with no delay.
AsyncOPD: How Stale Can On-Policy Distillation Be? cs.LG · 2026-06-23 · conditional · none · ref 18
AsyncOPD shows asynchronous OPD training reaches 1.6-3.8x higher throughput than synchronous baselines with comparable accuracy by using forward-KL estimators and multi-sample Monte Carlo correction for finite teacher caches.
Spotlight: Synergizing Seed Exploration and Spot GPUs for DiT RL Post-Training cs.DC · 2026-06-17 · unverdicted · none · ref 18
Spotlight achieves 4x faster DiT RL post-training on spot GPUs via stale-weight exploration and elastic sequence parallelism, cutting costs 1.4-6.4x with better image quality.
AsyncWebRL: Efficient Multi-Step RL for Visual Web Agents cs.LG · 2026-06-04 · unverdicted · none · ref 29
AsyncWebRL reports up to 2.9x training speedup and new SOTA on WebGym OOD split via async overlap plus constant normalizer in GRPO, with largest gains on harder tasks.
Diagnosing Training Inference Mismatch in LLM Reinforcement Learning cs.LG · 2026-05-14 · unverdicted · none · ref 49
Training-inference mismatch in separated rollout and optimization stages of LLM RL can independently cause training collapse.
Missing Old Logits in Asynchronous Agentic RL: Semantic Mismatch and Repair Methods for Off-Policy Correction cs.LG · 2026-05-12 · unverdicted · none · ref 20
Missing old logits in async agentic RL entangle discrepancy and staleness terms in PPO off-policy correction; exact acquisition methods and revised PPO-EWMA restore decoupled updates with reported gains in speed and performance.
FlashEvolve: Accelerating Agent Self-Evolution with Asynchronous Stage Orchestration cs.LG · 2026-05-08 · unverdicted · none · ref 24
FlashEvolve accelerates LLM agent self-evolution via asynchronous stage orchestration and inspectable language-space staleness handling, reporting 3.5-4.9x proposal throughput gains over synchronous baselines on GEPA workloads.
ROSE: Rollout On Serving GPUs via Cooperative Elasticity for Agentic RL cs.DC · 2026-05-07 · unverdicted · none · ref 59 · 2 links
ROSE is a system for cooperative elasticity that co-locates serving and rollout models on shared GPUs, delivering 1.3-3.3x higher end-to-end throughput than fixed-resource baselines while preserving serving SLOs.
DORA: A Scalable Asynchronous Reinforcement Learning System for Language Model Training cs.LG · 2026-04-29 · unverdicted · none · ref 14
DORA's multi-version streaming rollout enables 2-3x higher throughput in asynchronous RL for LLMs while preserving convergence by maintaining policy consistency, data integrity, and bounded staleness.
JigsawRL: Assembling RL Pipelines for Efficient LLM Post-Training cs.LG · 2026-04-26 · unverdicted · none · ref 44
JigsawRL achieves up to 1.85x higher throughput in LLM RL pipelines via pipeline multiplexing, sub-stage graphs, and look-ahead scheduling compared to prior systems.
TensorHub: Scalable and Elastic Weight Transfer for LLM RL Training cs.DC · 2026-04-10 · unverdicted · none · ref 35
TensorHub uses Reference-Oriented Storage to enable scalable weight transfer in LLM RL training by referencing replicated GPU weights, achieving up to 19x reduction in cross-datacenter stall time.
Seer: Online Context Learning for Fast Synchronous LLM Reinforcement Learning cs.DC · 2025-11-18 · unverdicted · none · ref 34
Seer improves synchronous LLM RL rollout throughput by up to 2.04x and reduces long-tail latency by 72-94% via divided rollout, context-aware scheduling, and adaptive grouped speculative decoding based on prompt similarity observations.
Libra: Efficient Resource Management for Agentic RL Post-Training cs.LG · 2026-06-02 · unverdicted · none · ref 43
Libra optimizes GPU allocation across rollout and training in agentic RL via an elastic hybrid pool and C-MLFQ scheduler based on tool-return causal signals, claiming up to 3.0x throughput and 2.5x faster reward convergence on 48 A800 GPUs.

Laminar: A scalable asyn- chronous rl post-training framework

hub tools

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer