Learning Latent Representations for Speech Generation and Transformation

· 2017 · cs.CL · arXiv 1704.04222

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it

open full Pith review browse 2 citing papers arXiv PDF

abstract

An ability to model a generative process and learn a latent representation for speech in an unsupervised fashion will be crucial to process vast quantities of unlabelled speech data. Recently, deep probabilistic generative models such as Variational Autoencoders (VAEs) have achieved tremendous success in modeling natural images. In this paper, we apply a convolutional VAE to model the generative process of natural speech. We derive latent space arithmetic operations to disentangle learned latent representations. We demonstrate the capability of our model to modify the phonetic content or the speaker identity for speech segments using the derived operations, without the need for parallel supervisory data.

representative citing papers

Non-Parallel Voice Conversion with Cyclic Variational Autoencoder

eess.AS · 2019-07-24 · unverdicted · novelty 6.0

CycleVAE optimizes non-parallel voice conversion indirectly via cyclic reconstructed spectra, yielding higher spectral accuracy, latent feature correlation, and improved converted speech quality.

Classical Music Prediction and Composition by means of Variational Autoencoders

cs.SD · 2019-06-21 · unverdicted · novelty 3.0

VAEs are trained on classical music to encode pieces into latent space and predict continuations, enabling composition of new music from existing pieces or random starts even with small training sets.

citing papers explorer

Showing 2 of 2 citing papers.

Non-Parallel Voice Conversion with Cyclic Variational Autoencoder eess.AS · 2019-07-24 · unverdicted · none · ref 38 · internal anchor
CycleVAE optimizes non-parallel voice conversion indirectly via cyclic reconstructed spectra, yielding higher spectral accuracy, latent feature correlation, and improved converted speech quality.
Classical Music Prediction and Composition by means of Variational Autoencoders cs.SD · 2019-06-21 · unverdicted · none · ref 17 · internal anchor
VAEs are trained on classical music to encode pieces into latent space and predict continuations, enabling composition of new music from existing pieces or random starts even with small training sets.

Learning Latent Representations for Speech Generation and Transformation

fields

years

verdicts

representative citing papers

citing papers explorer