MARS is a transfer-based black-box attack that uses bi-level optimization on semantic and artifact anchors to escape the linearity trap and improve attack success rates on SSL-SVDD by up to 36%.
Transformer Transducer: A Streamable Speech Recognition Model
6 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
years
2026 6verdicts
UNVERDICTED 6roles
background 1polarities
background 1representative citing papers
Layer-wise probing of wav2vec2-base and Whisper-small shows both models distinguish reduced vs. canonical consonant clusters in AAE with high accuracy and retain cues to underlying stops, encoding CCR as gradient variation.
MIRAGE uses adaptive multimodal gating on native multimodal backbones plus a transformer encoder to achieve state-of-the-art whole-brain fMRI prediction for naturalistic audiovisual stimuli, outperforming post-hoc unimodal aggregation.
An encoding probe reconstructs transformer representations from acoustic, phonetic, syntactic, lexical and speaker features, showing independent syntactic/lexical contributions and training-dependent speaker effects.
WhisperPipe delivers 89 ms median latency and 48% lower peak GPU memory than standard Whisper while keeping word error rate within 2% of the offline model.
MOSS-Audio is an audio-language model using a 12.5 Hz encoder, DeepStack cross-layer injection, time markers, and an event-preserving annotation pipeline for unified audio understanding.
citing papers explorer
-
WhisperPipe: A Resource-Efficient Streaming Architecture for Real-Time Automatic Speech Recognition
WhisperPipe delivers 89 ms median latency and 48% lower peak GPU memory than standard Whisper while keeping word error rate within 2% of the offline model.