Efficient and generalizable speaker diarization via structured pruning of self-supervised models

· 2025 · arXiv 2506.18623

4 Pith papers cite this work. Polarity classification is still indexing.

4 Pith papers citing it

representative citing papers

Tight Boundary Prediction in Speaker Diarization Using Causal-Anticausal Consistency

eess.AS · 2026-06-10 · unverdicted · novelty 6.0

Causal-anticausal consistency co-training recovers about 70% of the boundary-tightening effect possible with ideal tight labels in speaker diarization.

Balancing ASR and diarization in end-to-end LLMs for multi-talker speech recognition

eess.AS · 2026-06-11 · unverdicted · novelty 4.0

LLM-based multi-talker ASR with dual-encoder, feature interleaving, length-aware speaker loss, and adaptive ASR threshold achieves 18% and 24% relative gains over baselines on AliMeeting and Aishell4.

Audio-Mind: An Auditable Agentic Framework for Audio Understanding

eess.AS · 2026-05-27 · unverdicted · novelty 4.0

Audio-Mind introduces a conditional, auditable agentic framework for audio understanding that preserves frontend judgment and acquires bounded external evidence only when needed, reporting 80.4% on MMAR and 82.8% on MSU-Bench.

Exploring Speech Foundation Models for Speaker Diarization Across Lifespan

eess.AS · 2026-04-06 · unverdicted · novelty 4.0 · 2 refs

Cross-lifespan evaluation shows adult-trained speech foundation models degrade on child and older-adult data, with joint multi-age training and targeted adaptation improving robustness especially using Whisper encoder.

citing papers explorer

Showing 4 of 4 citing papers after filters.

Tight Boundary Prediction in Speaker Diarization Using Causal-Anticausal Consistency eess.AS · 2026-06-10 · unverdicted · none · ref 43
Causal-anticausal consistency co-training recovers about 70% of the boundary-tightening effect possible with ideal tight labels in speaker diarization.
Balancing ASR and diarization in end-to-end LLMs for multi-talker speech recognition eess.AS · 2026-06-11 · unverdicted · none · ref 36
LLM-based multi-talker ASR with dual-encoder, feature interleaving, length-aware speaker loss, and adaptive ASR threshold achieves 18% and 24% relative gains over baselines on AliMeeting and Aishell4.
Audio-Mind: An Auditable Agentic Framework for Audio Understanding eess.AS · 2026-05-27 · unverdicted · none · ref 13
Audio-Mind introduces a conditional, auditable agentic framework for audio understanding that preserves frontend judgment and acquires bounded external evidence only when needed, reporting 80.4% on MMAR and 82.8% on MSU-Bench.
Exploring Speech Foundation Models for Speaker Diarization Across Lifespan eess.AS · 2026-04-06 · unverdicted · none · ref 19 · 2 links
Cross-lifespan evaluation shows adult-trained speech foundation models degrade on child and older-adult data, with joint multi-age training and targeted adaptation improving robustness especially using Whisper encoder.

Efficient and generalizable speaker diarization via structured pruning of self-supervised models

fields

years

verdicts

representative citing papers

citing papers explorer