pith. sign in

hub Canonical reference

Reinforcement learning for long-horizon interactive llm agents

Canonical reference. 71% of citing Pith papers cite this work as background.

30 Pith papers citing it
Background 71% of classified citations

hub tools

citation-role summary

background 7

citation-polarity summary

years

2026 26 2025 4

roles

background 7

polarities

background 5 unclear 2

clear filters

representative citing papers

Group-in-Group Policy Optimization for LLM Agent Training

cs.LG · 2025-05-16 · unverdicted · novelty 7.0

GiGPO adds a hierarchical grouping mechanism to group-based RL so that LLM agents receive both global trajectory and local step-level credit signals, yielding >12% gains on ALFWorld and >9% on WebShop over GRPO while keeping the same rollout and memory footprint.

Rank-Then-Act: Reward-Free Control from Frame-Order Progress

cs.LG · 2026-07-02 · unverdicted · novelty 6.0

RTA trains a VLM as a progress ordinal scorer via GRPO on shuffled expert frames and uses Spearman rank correlation with temporal indices as a bounded RL reward, matching or exceeding prior video reward methods on discrete and continuous control benchmarks.

Diagnosing Task Insensitivity in Language Agents

cs.AI · 2026-06-25 · unverdicted · novelty 6.0

The paper diagnoses task insensitivity in LLM agents as a cause of weak OOD generalization, links it to attention drift, and proposes Task-Perturbed NLL Optimization as a contrastive regularizer to improve task dependence.

A Survey on LLM-based Conversational User Simulation

cs.CL · 2026-04-27 · unverdicted · novelty 6.0

A survey that introduces a taxonomy for LLM-based conversational user simulation, analyzes core techniques and evaluation methods, and identifies open challenges in the field.

WorldSample: Closed-loop Real-robot RL with World Modelling

cs.RO · 2026-07-02 · unverdicted · novelty 5.0

WorldSample generates synthetic transitions from a post-trained world model grounded in real rollouts and uses Policy-Paced Learning to improve RL policies, reporting 28% higher success rates and 59% fewer training steps on contact-rich robot tasks.

Skill1: Unified Evolution of Skill-Augmented Agents via Reinforcement Learning

cs.AI · 2026-05-07 · unverdicted · novelty 5.0 · 3 refs

Skill1 trains a single RL policy to co-evolve skill selection, utilization, and distillation in language model agents from one task-outcome reward, using low-frequency trends to credit selection and high-frequency variation to credit distillation, outperforming baselines on ALFWorld and WebShop.

citing papers explorer

Showing 30 of 30 citing papers.