pith. sign in

hub Mixed citations

Tool-star: Empowering LLM-brained multi-tool reasoner via reinforcement learning

Mixed citation behavior. Most common role is background (50%).

15 Pith papers citing it
Background 50% of classified citations

hub tools

citation-role summary

background 5 baseline 1 dataset 1 other 1

citation-polarity summary

years

2026 12 2025 3

clear filters

representative citing papers

Learning Agentic Policy from Action Guidance

cs.CL · 2026-05-12 · unverdicted · novelty 7.0

ActGuide-RL uses human action data as plan-style guidance in mixed-policy RL to overcome exploration barriers in LLM agents, matching SFT+RL performance on search benchmarks without cold-start training.

Teaching Language Models to Think in Code

cs.CL · 2026-05-08 · unverdicted · novelty 7.0 · 2 refs

ThinC trains small models to reason primarily in code rather than natural language, outperforming tool-integrated baselines and even larger models on competition math benchmarks.

A Survey of Context Engineering for Large Language Models

cs.CL · 2025-07-17 · accept · novelty 4.0

The survey organizes Context Engineering into retrieval, processing, management, and integrated systems like RAG and multi-agent setups while identifying an asymmetry where LLMs handle complex inputs well but struggle with equally sophisticated long outputs.

citing papers explorer

Showing 5 of 5 citing papers after filters.

  • Learning Agentic Policy from Action Guidance cs.CL · 2026-05-12 · unverdicted · none · ref 13

    ActGuide-RL uses human action data as plan-style guidance in mixed-policy RL to overcome exploration barriers in LLM agents, matching SFT+RL performance on search benchmarks without cold-start training.

  • Teaching Language Models to Think in Code cs.CL · 2026-05-08 · unverdicted · none · ref 3 · 2 links

    ThinC trains small models to reason primarily in code rather than natural language, outperforming tool-integrated baselines and even larger models on competition math benchmarks.

  • ToolCUA: Towards Optimal GUI-Tool Path Orchestration for Computer Use Agents cs.AI · 2026-05-12 · unverdicted · none · ref 7

    ToolCUA introduces a trajectory scaling pipeline and staged RL to optimize GUI-tool switching, reaching 46.85% accuracy on OSWorld-MCP for a 66% relative gain over baseline.

  • Multi-Modal Multi-Agent Reinforcement Learning for Radiology Report Generation cs.CV · 2026-02-17 · unverdicted · none · ref 27

    MARL-Rad trains region-specific and global agents with reinforcement learning on clinical rewards to produce more accurate radiology reports than prior methods on MIMIC-CXR and IU X-ray datasets.

  • A Survey of Context Engineering for Large Language Models cs.CL · 2025-07-17 · accept · none · ref 231

    The survey organizes Context Engineering into retrieval, processing, management, and integrated systems like RAG and multi-agent setups while identifying an asymmetry where LLMs handle complex inputs well but struggle with equally sophisticated long outputs.