WSRGlow: A Glow-based Waveform Generative Model for Audio Super-Resolution

Changliang Xu; Kexun Zhang; Yi Ren; Zhou Zhao

arxiv: 2106.08507 · v1 · pith:TUJCQPP6new · submitted 2021-06-16 · 💻 cs.SD · eess.AS

WSRGlow: A Glow-based Waveform Generative Model for Audio Super-Resolution

Kexun Zhang , Yi Ren , Changliang Xu , Zhou Zhao This is my paper

classification 💻 cs.SD eess.AS

keywords audiomodelgenerativeinformationsuper-resolutionwsrglowdomainencoder

0 comments

read the original abstract

Audio super-resolution is the task of constructing a high-resolution (HR) audio from a low-resolution (LR) audio by adding the missing band. Previous methods based on convolutional neural networks and mean squared error training objective have relatively low performance, while adversarial generative models are difficult to train and tune. Recently, normalizing flow has attracted a lot of attention for its high performance, simple training and fast inference. In this paper, we propose WSRGlow, a Glow-based waveform generative model to perform audio super-resolution. Specifically, 1) we integrate WaveNet and Glow to directly maximize the exact likelihood of the target HR audio conditioned on LR information; and 2) to exploit the audio information from low-resolution audio, we propose an LR audio encoder and an STFT encoder, which encode the LR information from the time domain and frequency domain respectively. The experimental results show that the proposed model is easier to train and outperforms the previous works in terms of both objective and perceptual quality. WSRGlow is also the first model to produce 48kHz waveforms from 12kHz LR audio.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

A Survey of Advancing Audio Super-Resolution and Bandwidth Extension from Discriminative to Generative Models
eess.AS 2026-05 unverdicted novelty 2.0

A structured survey of audio bandwidth extension that organizes the transition from deterministic discriminative DNNs to generative approaches including GANs, diffusion models, and flow-based methods.