DASB is a new benchmark for discrete audio tokens showing semantic tokens outperform acoustic ones but discrete representations remain less robust than continuous features across domains.
I., and Bittner, R
6 Pith papers cite this work. Polarity classification is still indexing.
representative citing papers
Q2D2 uses 2D geometric grid projections to quantize feature pairs in neural audio codecs, yielding implicit codebooks that improve efficiency and utilization over RVQ, VQ, and FSQ while maintaining reconstruction quality.
Permutation-equivariant training via matched random channel shuffling improves SDR and reduces microphone bleed in multi-channel music source separation under unseen conditions.
MERT embedding-based MSE and intrusive FAD metrics correlate more strongly with perceptual audio quality ratings than BSS-Eval metrics across stems and models in musical source separation.
XAttnMark is a new neural audio watermarking method using partial parameter sharing, cross-attention for message retrieval, temporal conditioning, and a psychoacoustic TF masking loss that reports state-of-the-art detection and attribution robustness.
citing papers explorer
No citing papers match the current filters.