High-fidelity audio compression with improved rvqgan

Rithesh Kumar, Prem Seetharaman, Alejandro Luebs, Ishaan Kumar, Kundan Kumar · 2023 · arXiv 2306.06546

4 Pith papers cite this work. Polarity classification is still indexing.

4 Pith papers citing it

read on arXiv browse 4 citing papers

citation-role summary

background 1

citation-polarity summary

background 1

representative citing papers

Codec-Robust Attacks on Audio LLMs

cs.SD · 2026-05-19 · unverdicted · novelty 7.0 · 2 refs

CodecAttack perturbs audio in codec latent space with multi-bitrate EoT to achieve 85.5% average ASR on Opus-compressed Audio LLMs versus under 26% for waveform baselines, with transfer to MP3 and AAC.

Two-Dimensional Quantization for Geometry-Aware Audio Coding

cs.SD · 2025-12-01 · unverdicted · novelty 6.0

Q2D2 uses 2D geometric grid projections to quantize feature pairs in neural audio codecs, yielding implicit codebooks that improve efficiency and utilization over RVQ, VQ, and FSQ while maintaining reconstruction quality.

Drum Synthesis from Expressive Drum Grids via Neural Audio Codecs

cs.SD · 2026-05-11 · unverdicted · novelty 5.0

A Transformer predicts tokens from neural audio codecs (EnCodec, DAC, X-Codec) to convert expressive drum grids into audio, trained and evaluated on the E-GMD dataset using objective metrics.

Woosh: A Sound Effects Foundation Model

cs.SD · 2026-04-02 · accept · novelty 5.0

Woosh is a new publicly released foundation model optimized for high-quality sound effect generation from text or video, showing competitive or better results than open alternatives like Stable Audio Open.

citing papers explorer

Showing 4 of 4 citing papers.

Codec-Robust Attacks on Audio LLMs cs.SD · 2026-05-19 · unverdicted · none · ref 23 · 2 links
CodecAttack perturbs audio in codec latent space with multi-bitrate EoT to achieve 85.5% average ASR on Opus-compressed Audio LLMs versus under 26% for waveform baselines, with transfer to MP3 and AAC.
Two-Dimensional Quantization for Geometry-Aware Audio Coding cs.SD · 2025-12-01 · unverdicted · none · ref 42
Q2D2 uses 2D geometric grid projections to quantize feature pairs in neural audio codecs, yielding implicit codebooks that improve efficiency and utilization over RVQ, VQ, and FSQ while maintaining reconstruction quality.
Drum Synthesis from Expressive Drum Grids via Neural Audio Codecs cs.SD · 2026-05-11 · unverdicted · none · ref 6
A Transformer predicts tokens from neural audio codecs (EnCodec, DAC, X-Codec) to convert expressive drum grids into audio, trained and evaluated on the E-GMD dataset using objective metrics.
Woosh: A Sound Effects Foundation Model cs.SD · 2026-04-02 · accept · none · ref 13
Woosh is a new publicly released foundation model optimized for high-quality sound effect generation from text or video, showing competitive or better results than open alternatives like Stable Audio Open.

High-fidelity audio compression with improved rvqgan

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer