pith. sign in

arxiv: 2601.03872 · v2 · pith:WEBJZ6RZnew · submitted 2026-01-07 · 💻 cs.CL

Atlas: Orchestrating Heterogeneous Models and Tools for Multi-Domain Complex Reasoning

classification 💻 cs.CL
keywords toolsatlasmodelsreasoningroutingacrossalignmentcomplex
0
0 comments X
read the original abstract

The integration of large language models (LLMs) with external tools has significantly expanded the capabilities of AI agents. However, as the diversity of both LLMs and tools increases, selecting the optimal model-tool combination becomes a high-dimensional optimization challenge. Existing approaches often rely on a single model or fixed tool-calling logic, failing to exploit the performance variations across heterogeneous model-tool pairs. In this paper, we present ATLAS (Adaptive Tool-LLM Alignment and Synergistic Invocation), a dual-path framework for dynamic tool usage in cross-domain complex reasoning. ATLAS operates via a dual-path approach: (1) \textbf{training-free cluster-based routing} that exploits empirical priors for domain-specific alignment, and (2) \textbf{RL-based multi-step routing} that explores autonomous trajectories for out-of-distribution generalization. Extensive experiments across 15 benchmarks demonstrate that our method outperforms closed-source models like GPT-4o, surpassing existing routing methods on both in-distribution (+10.1%) and out-of-distribution (+13.1%) tasks. Furthermore, our framework shows significant gains in visual reasoning by orchestrating specialized multi-modal tools.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 3 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. ProactiveMobile: A Comprehensive Benchmark for Boosting Proactive Intelligence on Mobile Devices

    cs.AI 2026-02 conditional novelty 7.0

    ProactiveMobile is a new benchmark for proactive mobile agents that tests latent intent inference from context and executable API generation, where a fine-tuned 7B model reaches 19.15% success versus 15.71% for o1 and...

  2. Maestro: Reinforcement Learning to Orchestrate Hierarchical Model-Skill Ensembles

    cs.LG 2026-05 unverdicted novelty 6.0

    Maestro uses outcome-based RL to train a lightweight policy that orchestrates ensembles of frozen expert models and skills, reporting 70.1% average accuracy across ten multimodal benchmarks and outperforming GPT-5 and...

  3. Uno-Orchestra: Parsimonious Agent Routing via Selective Delegation

    cs.AI 2026-05 unverdicted novelty 6.0

    A learned orchestration policy for LLM agents that jointly optimizes task decomposition and selective routing to (model, primitive) pairs, delivering 77% macro pass@1 at 10x lower cost than strong baselines across 13 ...