Distilling an end-to-end voice as- sistant without instruction training data

William Held, Ella Li, Michael Ryan, Weiyan Shi, Yanzhe Zhang, Diyi Yang, “Distilling an end-to-end voice assistant without instruction training data,”arXiv preprint arXiv:2410 · 2024 · arXiv 2410.02678

3 Pith papers cite this work. Polarity classification is still indexing.

3 Pith papers citing it

read on arXiv browse 3 citing papers

representative citing papers

VoiceBench: Benchmarking LLM-Based Voice Assistants

cs.CL · 2024-10-22 · unverdicted · novelty 7.0

VoiceBench is the first benchmark for multi-faceted evaluation of LLM voice assistants using real and synthetic spoken instructions with speaker, environmental, and content variations.

Benchmarking Gaslighting Attacks Against Speech Large Language Models

cs.CL · 2025-09-24 · unverdicted · novelty 6.0

Gaslighting attacks using Anger, Cognitive Disruption, Sarcasm, Implicit, and Professional Negation strategies cause a 24.3% average accuracy drop in Speech LLMs while also triggering behavioral changes like apologies and refusals.

On The Landscape of Spoken Language Models: A Comprehensive Survey

cs.CL · 2025-04-11 · unverdicted · novelty 3.0

A literature survey that organizes spoken language models by architecture, training, and evaluation choices and identifies key challenges and future directions.

citing papers explorer

Showing 3 of 3 citing papers.

VoiceBench: Benchmarking LLM-Based Voice Assistants cs.CL · 2024-10-22 · unverdicted · none · ref 74
VoiceBench is the first benchmark for multi-faceted evaluation of LLM voice assistants using real and synthetic spoken instructions with speaker, environmental, and content variations.
Benchmarking Gaslighting Attacks Against Speech Large Language Models cs.CL · 2025-09-24 · unverdicted · none · ref 15
Gaslighting attacks using Anger, Cognitive Disruption, Sarcasm, Implicit, and Professional Negation strategies cause a 24.3% average accuracy drop in Speech LLMs while also triggering behavioral changes like apologies and refusals.
On The Landscape of Spoken Language Models: A Comprehensive Survey cs.CL · 2025-04-11 · unverdicted · none · ref 22
A literature survey that organizes spoken language models by architecture, training, and evaluation choices and identifies key challenges and future directions.

Distilling an end-to-end voice as- sistant without instruction training data

fields

years

verdicts

representative citing papers

citing papers explorer