Au- diotoolagent: An agentic framework for audio-language mod- els

· 2025 · arXiv 2510.02995

4 Pith papers cite this work. Polarity classification is still indexing.

4 Pith papers citing it

read on arXiv browse 4 citing papers

citation-role summary

background 2

citation-polarity summary

background 2

representative citing papers

A Survey of Large Audio Language Models: Generalization, Trustworthiness, and Outlook

cs.SD · 2026-05-18 · unverdicted · novelty 5.0

A survey of Large Audio Language Models that establishes a taxonomy of trustworthiness vulnerabilities and proposes a Defense-in-Depth roadmap for audio intelligence.

Audio-Mind: An Auditable Agentic Framework for Audio Understanding

eess.AS · 2026-05-27 · unverdicted · novelty 4.0

Audio-Mind introduces a conditional, auditable agentic framework for audio understanding that preserves frontend judgment and acquires bounded external evidence only when needed, reporting 80.4% on MMAR and 82.8% on MSU-Bench.

VISA: A Visual Information Strengthened Audio-Reasoning System for the Interspeech 2026 ARC Agent Track

eess.AS · 2026-06-05 · unverdicted · novelty 3.0

VISA ranks 2nd in the Interspeech 2026 ARC Agent Track by adding multi-modal feature extraction, consistency-checked model voting, and rubric-aligned routing to large audio language models, reaching 66.23% Rubrics score and 77.40% accuracy.

A Survey of Audio Reasoning in Multimodal Foundation Models

eess.AS · 2026-05-20 · unverdicted · novelty 2.0

A survey that provides a unified formulation of audio reasoning and reviews advances across Audio-to-Text, Audio-to-Speech, Audio-Visual, and Agentic paradigms while discussing challenges and future directions.

citing papers explorer

Showing 4 of 4 citing papers.

A Survey of Large Audio Language Models: Generalization, Trustworthiness, and Outlook cs.SD · 2026-05-18 · unverdicted · none · ref 93
A survey of Large Audio Language Models that establishes a taxonomy of trustworthiness vulnerabilities and proposes a Defense-in-Depth roadmap for audio intelligence.
Audio-Mind: An Auditable Agentic Framework for Audio Understanding eess.AS · 2026-05-27 · unverdicted · none · ref 42
Audio-Mind introduces a conditional, auditable agentic framework for audio understanding that preserves frontend judgment and acquires bounded external evidence only when needed, reporting 80.4% on MMAR and 82.8% on MSU-Bench.
VISA: A Visual Information Strengthened Audio-Reasoning System for the Interspeech 2026 ARC Agent Track eess.AS · 2026-06-05 · unverdicted · none · ref 22
VISA ranks 2nd in the Interspeech 2026 ARC Agent Track by adding multi-modal feature extraction, consistency-checked model voting, and rubric-aligned routing to large audio language models, reaching 66.23% Rubrics score and 77.40% accuracy.
A Survey of Audio Reasoning in Multimodal Foundation Models eess.AS · 2026-05-20 · unverdicted · none · ref 110
A survey that provides a unified formulation of audio reasoning and reviews advances across Audio-to-Text, Audio-to-Speech, Audio-Visual, and Agentic paradigms while discussing challenges and future directions.

Au- diotoolagent: An agentic framework for audio-language mod- els

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer