RubricRefine is a training-free pre-execution method that creates rubrics to score and fix inter-tool contract violations in agent code, reaching 0.86 average on M3ToolEval across seven models with zero executions and lower latency.
hub
Nemotron-Research-Tool-N1: Exploring tool-using language models with reinforced reasoning
12 Pith papers cite this work. Polarity classification is still indexing.
hub tools
citation-role summary
citation-polarity summary
representative citing papers
COVERT generates verifiable synthetic tool-use environments for RL by validated trajectory synthesis and oracle-preserving augmentations, improving tool-use accuracy on BFCL v3 and ACEBench while remaining complementary to SFT.
MURPHY improves code generation pass rates by up to 6% through retrospective credit assignment on multi-turn feedback trees using max or mean reward propagation.
Tool-calling evaluations for LLM agents are highly sensitive to implementation details such as random seeds and history handling, and two new techniques accelerate RL training with wall-clock speedup and no performance degradation.
Entropy polarity is a signed token-level quantity derived from a first-order approximation of entropy change that predicts whether RL updates expand or contract policy entropy in LLM fine-tuning, revealing an asymmetry between high- and low-probability tokens.
CuraView detects sentence-level faithfulness hallucinations in medical discharge summaries via GraphRAG knowledge graphs and multi-agent evidence grading, achieving 0.831 F1 on critical contradictions with a fine-tuned Qwen3-14B model and 50% relative improvement over baselines.
TRUSTEE uses an 8B LM to simulate complete dynamic environments for RL-based tool learning and outperforms baselines that require extra external resources.
LAST augments MLLMs with a tool-abstraction sandbox and three-stage training to deliver around 20% gains on spatial reasoning tasks, outperforming closed-source models.
RL post-training lifts answer correctness on FHIR-AgentBench from 50% (o4-mini) to 77% with a cheaper Qwen3-8B CodeAct agent.
Webscale-RL generates 1.2M verifiable QA pairs from pretraining corpora, enabling RL training that matches continual pretraining performance with up to 100x fewer tokens.
EKSFT masks high-entropy or high-KL tokens in low-data SFT to preserve pre-trained distribution and improve downstream RL performance on math reasoning tasks.
citing papers explorer
-
RubricRefine: Improving Tool-Use Agent Reliability with Training-Free Pre-Execution Refinement
RubricRefine is a training-free pre-execution method that creates rubrics to score and fix inter-tool contract violations in agent code, reaching 0.86 average on M3ToolEval across seven models with zero executions and lower latency.
-
Controllable and Verifiable Tool-Use Data Synthesis for Agentic Reinforcement Learning
COVERT generates verifiable synthetic tool-use environments for RL by validated trajectory synthesis and oracle-preserving augmentations, improving tool-use accuracy on BFCL v3 and ACEBench while remaining complementary to SFT.
-
MURPHY: Feedback-Aware GRPO with Retrospective Credit Assignment for Multi-Turn Code Generation
MURPHY improves code generation pass rates by up to 6% through retrospective credit assignment on multi-turn feedback trees using max or mean reward propagation.
-
On Effectiveness and Efficiency of Agentic Tool-calling and RL Training
Tool-calling evaluations for LLM agents are highly sensitive to implementation details such as random seeds and history handling, and two new techniques accelerate RL training with wall-clock speedup and no performance degradation.
-
Entropy Polarity in Reinforcement Fine-Tuning: Direction, Asymmetry, and Control
Entropy polarity is a signed token-level quantity derived from a first-order approximation of entropy change that predicts whether RL updates expand or contract policy entropy in LLM fine-tuning, revealing an asymmetry between high- and low-probability tokens.
-
CuraView: A Multi-Agent Framework for Medical Hallucination Detection with GraphRAG-Enhanced Knowledge Verification
CuraView detects sentence-level faithfulness hallucinations in medical discharge summaries via GraphRAG knowledge graphs and multi-agent evidence grading, achieving 0.831 F1 on critical contradictions with a fine-tuned Qwen3-14B model and 50% relative improvement over baselines.
-
Democratizing Tool Learning with Environments Fully Simulated by a Free 8B Language Model
TRUSTEE uses an 8B LM to simulate complete dynamic environments for RL-based tool learning and outperforms baselines that require extra external resources.
-
LAST: Leveraging Tools as Hints to Enhance Spatial Reasoning for Multimodal Large Language Models
LAST augments MLLMs with a tool-abstraction sandbox and three-stage training to deliver around 20% gains on spatial reasoning tasks, outperforming closed-source models.
-
Reinforcement Learning for Tool-Calling Agents in Fast Healthcare Interoperability Resources (FHIR)
RL post-training lifts answer correctness on FHIR-AgentBench from 50% (o4-mini) to 77% with a cheaper Qwen3-8B CodeAct agent.
-
Webscale-RL: Automated Data Pipeline for Scaling RL Data to Pretraining Levels
Webscale-RL generates 1.2M verifiable QA pairs from pretraining corpora, enabling RL training that matches continual pretraining performance with up to 100x fewer tokens.
-
Entropy-KL Divergence-based Token Masking: A Novel Approach for Selective Fine-tuning of Large Language Models
EKSFT masks high-entropy or high-KL tokens in low-data SFT to preserve pre-trained distribution and improve downstream RL performance on math reasoning tasks.
- R2IF: Aligning Reasoning with Decisions via Composite Rewards for Interpretable LLM Function Calling