hub

Nemotron-Research-Tool-N1: Exploring tool-using language models with reinforced reasoning

Shaokun Zhang, Yi Dong, Jieyu Zhang, Jan Kautz, Bryan Catanzaro, Andrew Tao · 2025 · arXiv 2505.00024

13 Pith papers cite this work. Polarity classification is still indexing.

13 Pith papers citing it

read on arXiv browse 13 citing papers

hub tools

JSON dossier citing papers JSON arXiv source

citation-role summary

background 1 method 1

citation-polarity summary

background 1 use method 1

representative citing papers

RubricRefine: Improving Tool-Use Agent Reliability with Training-Free Pre-Execution Refinement

cs.LG · 2026-05-10 · unverdicted · novelty 7.0 · 3 refs

RubricRefine is a training-free pre-execution method that creates rubrics to score and fix inter-tool contract violations in agent code, reaching 0.86 average on M3ToolEval across seven models with zero executions and lower latency.

Controllable and Verifiable Tool-Use Data Synthesis for Agentic Reinforcement Learning

cs.AI · 2026-04-10 · unverdicted · novelty 7.0

COVERT generates verifiable synthetic tool-use environments for RL by validated trajectory synthesis and oracle-preserving augmentations, improving tool-use accuracy on BFCL v3 and ACEBench while remaining complementary to SFT.

MURPHY: Feedback-Aware GRPO with Retrospective Credit Assignment for Multi-Turn Code Generation

cs.LG · 2025-11-11 · unverdicted · novelty 7.0

MURPHY improves code generation pass rates by up to 6% through retrospective credit assignment on multi-turn feedback trees using max or mean reward propagation.

Synthesize and Reward -- Reinforcement Learning for Multi-Step Tool Use in Live Environments

cs.CL · 2026-06-02 · unverdicted · novelty 6.0

PROVE trains LLMs on multi-step tool calls using 20 live MCP servers with 343 tools, state-grounded synthesis, and adaptive efficiency rewards, delivering gains of up to 10.2 points on BFCL Multi-Turn and similar on other benchmarks.

On Effectiveness and Efficiency of Agentic Tool-calling and RL Training

cs.LG · 2026-05-28 · unverdicted · novelty 6.0

Tool-calling evaluations for LLM agents are highly sensitive to implementation details such as random seeds and history handling, and two new techniques accelerate RL training with wall-clock speedup and no performance degradation.

Entropy Polarity in Reinforcement Fine-Tuning: Direction, Asymmetry, and Control

cs.LG · 2026-05-12 · unverdicted · novelty 6.0 · 2 refs

Entropy polarity is a signed token-level quantity derived from a first-order approximation of entropy change that predicts whether RL updates expand or contract policy entropy in LLM fine-tuning, revealing an asymmetry between high- and low-probability tokens.

CuraView: A Multi-Agent Framework for Medical Hallucination Detection with GraphRAG-Enhanced Knowledge Verification

cs.CL · 2026-05-05 · unverdicted · novelty 6.0

CuraView detects sentence-level faithfulness hallucinations in medical discharge summaries via GraphRAG knowledge graphs and multi-agent evidence grading, achieving 0.831 F1 on critical contradictions with a fine-tuned Qwen3-14B model and 50% relative improvement over baselines.

Democratizing Tool Learning with Environments Fully Simulated by a Free 8B Language Model

cs.LG · 2026-04-20 · unverdicted · novelty 6.0

TRUSTEE uses an 8B LM to simulate complete dynamic environments for RL-based tool learning and outperforms baselines that require extra external resources.

LAST: Leveraging Tools as Hints to Enhance Spatial Reasoning for Multimodal Large Language Models

cs.CV · 2026-04-08 · unverdicted · novelty 6.0

LAST augments MLLMs with a tool-abstraction sandbox and three-stage training to deliver around 20% gains on spatial reasoning tasks, outperforming closed-source models.

Reinforcement Learning for Tool-Calling Agents in Fast Healthcare Interoperability Resources (FHIR)

cs.LG · 2026-05-13 · unverdicted · novelty 5.0

RL post-training lifts answer correctness on FHIR-AgentBench from 50% (o4-mini) to 77% with a cheaper Qwen3-8B CodeAct agent.

Webscale-RL: Automated Data Pipeline for Scaling RL Data to Pretraining Levels

cs.CL · 2025-10-07 · unverdicted · novelty 5.0

Webscale-RL generates 1.2M verifiable QA pairs from pretraining corpora, enabling RL training that matches continual pretraining performance with up to 100x fewer tokens.

Entropy-KL Divergence-based Token Masking: A Novel Approach for Selective Fine-tuning of Large Language Models

cs.AI · 2026-05-28 · unverdicted · novelty 4.0

EKSFT masks high-entropy or high-KL tokens in low-data SFT to preserve pre-trained distribution and improve downstream RL performance on math reasoning tasks.

R2IF: Aligning Reasoning with Decisions via Composite Rewards for Interpretable LLM Function Calling

cs.LG · 2026-04-22

citing papers explorer

Showing 1 of 1 citing paper after filters.

CuraView: A Multi-Agent Framework for Medical Hallucination Detection with GraphRAG-Enhanced Knowledge Verification cs.CL · 2026-05-05 · unverdicted · none · ref 44
CuraView detects sentence-level faithfulness hallucinations in medical discharge summaries via GraphRAG knowledge graphs and multi-agent evidence grading, achieving 0.831 F1 on critical contradictions with a fine-tuned Qwen3-14B model and 50% relative improvement over baselines.

Nemotron-Research-Tool-N1: Exploring tool-using language models with reinforced reasoning

hub tools

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer