DASB is a new benchmark for discrete audio tokens showing semantic tokens outperform acoustic ones but discrete representations remain less robust than continuous features across domains.
Title resolution pending
5 Pith papers cite this work. Polarity classification is still indexing.
verdicts
UNVERDICTED 5representative citing papers
MAGE unifies text, visual, and audio-conditioned music generation and editing in one flow-based latent model with dynamic modality masking and cross-gated control.
Q2D2 uses 2D geometric grid projections to quantize feature pairs in neural audio codecs, yielding implicit codebooks that improve efficiency and utilization over RVQ, VQ, and FSQ while maintaining reconstruction quality.
MERT embedding-based MSE and intrusive FAD metrics correlate more strongly with perceptual audio quality ratings than BSS-Eval metrics across stems and models in musical source separation.
XAttnMark is a new neural audio watermarking method using partial parameter sharing, cross-attention for message retrieval, temporal conditioning, and a psychoacoustic TF masking loss that reports state-of-the-art detection and attribution robustness.
citing papers explorer
-
DASB - Discrete Audio and Speech Benchmark
DASB is a new benchmark for discrete audio tokens showing semantic tokens outperform acoustic ones but discrete representations remain less robust than continuous features across domains.
-
MAGE: Modality-Agnostic Music Generation and Editing
MAGE unifies text, visual, and audio-conditioned music generation and editing in one flow-based latent model with dynamic modality masking and cross-gated control.
-
Two-Dimensional Quantization for Geometry-Aware Audio Coding
Q2D2 uses 2D geometric grid projections to quantize feature pairs in neural audio codecs, yielding implicit codebooks that improve efficiency and utilization over RVQ, VQ, and FSQ while maintaining reconstruction quality.
-
Embedding-Based Intrusive Evaluation Metrics for Musical Source Separation Using MERT Representations
MERT embedding-based MSE and intrusive FAD metrics correlate more strongly with perceptual audio quality ratings than BSS-Eval metrics across stems and models in musical source separation.
-
XAttnMark: Learning Robust Audio Watermarking with Cross-Attention
XAttnMark is a new neural audio watermarking method using partial parameter sharing, cross-attention for message retrieval, temporal conditioning, and a psychoacoustic TF masking loss that reports state-of-the-art detection and attribution robustness.