Toolformer: Language models can teach themselves to use tools

Timo Schick, Jane Dwivedi-Yu, Roberto Dessì, Roberta Raileanu, Maria Lomeli, Eric Hambro, Luke Zettlemoyer, Nicola Cancedda, Thomas Scialom · 2023

4 Pith papers cite this work. Polarity classification is still indexing.

4 Pith papers citing it

browse 4 citing papers

representative citing papers

StepFly: Agentic Troubleshooting Guide Automation for Incident Diagnosis

cs.AI · 2025-10-11 · conditional · novelty 7.0

StepFly automates TSG execution via TSG Mentor, LLM-based DAG extraction with QPPs, and a DAG-guided parallel scheduler, reaching 94% success on GPT-4.1 with 32.9-70.4% time savings on parallelizable guides.

CURE-Med: Curriculum-Informed Reinforcement Learning for Multilingual Medical Reasoning

cs.AI · 2026-01-19 · unverdicted · novelty 6.0

CURE-MED pairs a new 13-language medical reasoning benchmark with curriculum RL to raise logical correctness to 70% and language consistency to 95% at 32B scale while outperforming baselines.

Large Language Model Agent for User-friendly Chemical Process Simulations

physics.chem-ph · 2026-01-15 · unverdicted · novelty 6.0

An LLM agent integrated with AVEVA Process Simulation via MCP enables natural language driven flowsheet analysis, optimization, and construction for chemical separation processes.

NaviAgent: Bilevel Planning on Tool Navigation Graph for Large-Scale Orchestration

cs.AI · 2025-06-24 · unverdicted · novelty 6.0

NaviAgent decouples task planning from tool execution via a Tool World Navigation Model graph to improve scalability and success rates in LLM agents handling large tool ecosystems.

citing papers explorer

Showing 4 of 4 citing papers.

StepFly: Agentic Troubleshooting Guide Automation for Incident Diagnosis cs.AI · 2025-10-11 · conditional · none · ref 27
StepFly automates TSG execution via TSG Mentor, LLM-based DAG extraction with QPPs, and a DAG-guided parallel scheduler, reaching 94% success on GPT-4.1 with 32.9-70.4% time savings on parallelizable guides.
CURE-Med: Curriculum-Informed Reinforcement Learning for Multilingual Medical Reasoning cs.AI · 2026-01-19 · unverdicted · none · ref 40
CURE-MED pairs a new 13-language medical reasoning benchmark with curriculum RL to raise logical correctness to 70% and language consistency to 95% at 32B scale while outperforming baselines.
Large Language Model Agent for User-friendly Chemical Process Simulations physics.chem-ph · 2026-01-15 · unverdicted · none · ref 24
An LLM agent integrated with AVEVA Process Simulation via MCP enables natural language driven flowsheet analysis, optimization, and construction for chemical separation processes.
NaviAgent: Bilevel Planning on Tool Navigation Graph for Large-Scale Orchestration cs.AI · 2025-06-24 · unverdicted · none · ref 2
NaviAgent decouples task planning from tool execution via a Tool World Navigation Model graph to improve scalability and success rates in LLM agents handling large tool ecosystems.

Toolformer: Language models can teach themselves to use tools

fields

years

verdicts

representative citing papers

citing papers explorer