hub

arXiv preprint arXiv.2304.08354

Yujia Qin, Shengding Hu, Yankai Lin, et al · 2023 · arXiv 2304.08354

11 Pith papers cite this work. Polarity classification is still indexing.

11 Pith papers citing it

read on arXiv browse 11 citing papers

hub tools

JSON dossier citing papers JSON arXiv source

citation-role summary

background 1

citation-polarity summary

background 1

representative citing papers

FactoryBench: Evaluating Industrial Machine Understanding

cs.AI · 2026-05-08 · unverdicted · novelty 7.0

FactoryBench reveals that frontier LLMs achieve under 50% on structured causal questions and under 18% on decision-making in industrial robotic telemetry.

TADI: Tool-Augmented Drilling Intelligence via Agentic LLM Orchestration over Heterogeneous Wellsite Data

cs.AI · 2026-04-30 · unverdicted · novelty 7.0

TADI shows that domain-specialized tools orchestrated by an LLM over dual structured and semantic databases can convert heterogeneous wellsite data into evidence-grounded drilling intelligence, with tool design mattering more than model scale.

Medusa: Simple LLM Inference Acceleration Framework with Multiple Decoding Heads

cs.LG · 2024-01-19 · conditional · novelty 7.0

Medusa augments LLMs with multiple decoding heads and tree-based attention to predict and verify several tokens in parallel, yielding 2.2-3.6x inference speedup via two fine-tuning regimes.

TIDE-Bench: Task-Aware and Diagnostic Evaluation of Tool-Integrated Reasoning

cs.AI · 2026-05-10 · unverdicted · novelty 6.0

TIDE-Bench is a new benchmark for tool-integrated reasoning that combines diverse tasks, multi-aspect metrics covering answer quality, process reliability, efficiency and cost, plus filtered challenging test sets.

ToolRL: Reward is All Tool Learning Needs

cs.LG · 2025-04-16 · conditional · novelty 6.0

A principled reward design for tool selection and application in RL-trained LLMs delivers 17% gains over base models and 15% over SFT across benchmarks.

OS-ATLAS: A Foundation Action Model for Generalist GUI Agents

cs.CL · 2024-10-30 · unverdicted · novelty 6.0

OS-Atlas, trained on the largest open-source cross-platform GUI grounding corpus of 13 million elements, outperforms prior open-source models on six benchmarks across mobile, desktop, and web platforms.

Is Grep All You Need? How Agent Harnesses Reshape Agentic Search

cs.CL · 2026-05-14 · unverdicted · novelty 5.0

Grep retrieval generally outperforms vector retrieval in agentic search tasks, with performance varying strongly by agent harness and tool-calling style.

Understanding the planning of LLM agents: A survey

cs.AI · 2024-02-05 · accept · novelty 4.0

A survey that provides a taxonomy of methods for improving planning in LLM-based agents across task decomposition, plan selection, external modules, reflection, and memory.

The Rise and Potential of Large Language Model Based Agents: A Survey

cs.AI · 2023-09-14 · accept · novelty 4.0

The paper surveys the origins, frameworks, applications, and open challenges of AI agents built on large language models.

Siren's Song in the AI Ocean: A Survey on Hallucination in Large Language Models

cs.CL · 2023-09-03 · unverdicted · novelty 4.0

A literature survey that taxonomizes hallucination phenomena in LLMs, reviews evaluation benchmarks, and analyzes approaches for their detection, explanation, and mitigation.

SearchSkill: Teaching LLMs to Use Search Tools with Evolving Skill Banks

cs.AI · 2026-05-09

citing papers explorer

Showing 2 of 2 citing papers after filters.

Understanding the planning of LLM agents: A survey cs.AI · 2024-02-05 · accept · none · ref 33
A survey that provides a taxonomy of methods for improving planning in LLM-based agents across task decomposition, plan selection, external modules, reflection, and memory.
The Rise and Potential of Large Language Model Based Agents: A Survey cs.AI · 2023-09-14 · accept · none · ref 95
The paper surveys the origins, frameworks, applications, and open challenges of AI agents built on large language models.

arXiv preprint arXiv.2304.08354

hub tools

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer