CleanCodec reframes audio tokenization as a selective information bottleneck to encode only perceptually important features at 12.5 tokens per second, outperforming prior codecs in efficiency, speaker similarity, and intelligibility.
hub
VoxCeleb: A Large-Scale Speaker Identification Dataset
2 Pith papers cite this work, alongside 1,384 external citations. Polarity classification is still indexing.
2
Pith papers citing it
1,384
external citations · Crossref
hub tools
years
2026 2verdicts
UNVERDICTED 2representative citing papers
A self-supervised prosody encoder with speaker disentanglement strategies outperforms raw prosody and HuBERT baselines on pitch reconstruction and prosodic event detection while achieving strong speaker separation.
citing papers explorer
-
Privacy-preserving Prosody Representation Learning
A self-supervised prosody encoder with speaker disentanglement strategies outperforms raw prosody and HuBERT baselines on pitch reconstruction and prosodic event detection while achieving strong speaker separation.