RubricRefine is a training-free pre-execution method that creates rubrics to score and fix inter-tool contract violations in agent code, reaching 0.86 average on M3ToolEval across seven models with zero executions and lower latency.
hub
Nemotron-Research-Tool-N1: Exploring tool-using language models with reinforced reasoning
13 Pith papers cite this work. Polarity classification is still indexing.
hub tools
citation-role summary
citation-polarity summary
representative citing papers
COVERT generates verifiable synthetic tool-use environments for RL by validated trajectory synthesis and oracle-preserving augmentations, improving tool-use accuracy on BFCL v3 and ACEBench while remaining complementary to SFT.
MURPHY improves code generation pass rates by up to 6% through retrospective credit assignment on multi-turn feedback trees using max or mean reward propagation.
PROVE trains LLMs on multi-step tool calls using 20 live MCP servers with 343 tools, state-grounded synthesis, and adaptive efficiency rewards, delivering gains of up to 10.2 points on BFCL Multi-Turn and similar on other benchmarks.
Tool-calling evaluations for LLM agents are highly sensitive to implementation details such as random seeds and history handling, and two new techniques accelerate RL training with wall-clock speedup and no performance degradation.
Entropy polarity is a signed token-level quantity derived from a first-order approximation of entropy change that predicts whether RL updates expand or contract policy entropy in LLM fine-tuning, revealing an asymmetry between high- and low-probability tokens.
CuraView detects sentence-level faithfulness hallucinations in medical discharge summaries via GraphRAG knowledge graphs and multi-agent evidence grading, achieving 0.831 F1 on critical contradictions with a fine-tuned Qwen3-14B model and 50% relative improvement over baselines.
TRUSTEE uses an 8B LM to simulate complete dynamic environments for RL-based tool learning and outperforms baselines that require extra external resources.
LAST augments MLLMs with a tool-abstraction sandbox and three-stage training to deliver around 20% gains on spatial reasoning tasks, outperforming closed-source models.
RL post-training lifts answer correctness on FHIR-AgentBench from 50% (o4-mini) to 77% with a cheaper Qwen3-8B CodeAct agent.
Webscale-RL generates 1.2M verifiable QA pairs from pretraining corpora, enabling RL training that matches continual pretraining performance with up to 100x fewer tokens.
EKSFT masks high-entropy or high-KL tokens in low-data SFT to preserve pre-trained distribution and improve downstream RL performance on math reasoning tasks.
citing papers explorer
-
CuraView: A Multi-Agent Framework for Medical Hallucination Detection with GraphRAG-Enhanced Knowledge Verification
CuraView detects sentence-level faithfulness hallucinations in medical discharge summaries via GraphRAG knowledge graphs and multi-agent evidence grading, achieving 0.831 F1 on critical contradictions with a fine-tuned Qwen3-14B model and 50% relative improvement over baselines.