pith. sign in

hub Canonical reference

ToolAlpaca: Generalized Tool Learning for Language Models with 3000 Simulated Cases

Canonical reference. 83% of citing Pith papers cite this work as background.

44 Pith papers citing it
Background 83% of classified citations
abstract

Enabling large language models to utilize real-world tools effectively is crucial for achieving embodied intelligence. Existing approaches to tool learning have either primarily relied on extremely large language models, such as GPT-4, to attain generalized tool-use abilities in a zero-shot manner, or utilized supervised learning to train limited scopes of tools on compact models. However, it remains uncertain whether smaller language models can achieve generalized tool-use abilities without tool-specific training. To address this question, this paper introduces ToolAlpaca, a novel framework designed to automatically generate a diverse tool-use corpus and learn generalized tool-use abilities on compact language models with minimal human intervention. Specifically, ToolAlpaca first automatically creates a highly diversified tool-use corpus by building a multi-agent simulation environment. The corpus contains 3938 tool-use instances from more than 400 real-world tool APIs spanning 50 distinct categories. Subsequently, the constructed corpus is employed to fine-tune compact language models, resulting in two models, namely ToolAlpaca-7B and ToolAlpaca-13B, respectively. Finally, we evaluate the ability of these models to utilize previously unseen tools without specific training. Experimental results demonstrate that ToolAlpaca achieves effective generalized tool-use capabilities comparable to those of extremely large language models like GPT-3.5, demonstrating that learning generalized tool-use ability is feasible for compact language models.

hub tools

citation-role summary

background 5 dataset 1

citation-polarity summary

clear filters

representative citing papers

Revisable by Design: A Theory of Streaming LLM Agent Execution

cs.LG · 2026-04-25 · unverdicted · novelty 8.0

LLM agents achieve greater flexibility during execution by classifying actions via a reversibility taxonomy and using an Earliest-Conflict Rollback algorithm that matches full-restart quality while wasting far less completed work.

Memory-Induced Tool-Drift in LLM Agents

cs.CR · 2026-05-24 · unverdicted · novelty 7.0

Biased long-term memories in LLM agents cause measurable deviations in tool parameters across 105 scenarios, seven models, and 608 real tools, persisting under standard memory architectures.

Evaluating Tool Cloning in Agentic-AI Ecosystems

cs.SE · 2026-05-10 · conditional · novelty 7.0 · 2 refs

Tool cloning is pervasive in agentic AI ecosystems, with 60% of high-Jaccard and 85% of high-ssdeep MCP repository pairs manually verified as true clones.

Maestro: Reinforcement Learning to Orchestrate Hierarchical Model-Skill Ensembles

cs.LG · 2026-05-21 · unverdicted · novelty 6.0

Maestro uses outcome-based RL to train a lightweight policy that orchestrates ensembles of frozen expert models and skills, reporting 70.1% average accuracy across ten multimodal benchmarks and outperforming GPT-5 and Gemini-2.5-Pro while generalizing to unseen components.

The Scaling Laws of Skills in LLM Agent Systems

cs.CL · 2026-05-15 · unverdicted · novelty 6.0

Empirical analysis across 15 LLMs and 1,141 skills identifies a logarithmic routing decay law and a multiplicative execution law coupled by a single fitted slope parameter b that enables targeted library optimizations improving routing accuracy and downstream task pass rates.

citing papers explorer

Showing 3 of 3 citing papers after filters.

  • Evaluating Tool Cloning in Agentic-AI Ecosystems cs.SE · 2026-05-10 · conditional · none · ref 66 · 2 links · internal anchor

    Tool cloning is pervasive in agentic AI ecosystems, with 60% of high-Jaccard and 85% of high-ssdeep MCP repository pairs manually verified as true clones.

  • Firefly: Illuminating Large-Scale Verified Tool-Call Data Generation from Real APIs cs.SE · 2026-05-17 · unverdicted · none · ref 13 · internal anchor

    FireFly inverts task synthesis by exploring real MCP servers first via pairwise tool graphs and sub-DAG sampling, then generates 5,144 verified tasks backward from outcomes to train a 4B model that matches Claude Sonnet 4.6 on tool-calling benchmarks.

  • OpenRath: Session-Centered Runtime State for Agent Systems cs.SE · 2026-06-17 · unverdicted · none · ref 18 · internal anchor

    OpenRath introduces Session as a first-class, branchable runtime value that unifies fragmented state in multi-agent systems and makes fork, merge, and replay explicit operations.