pith. machine review for the scientific record. sign in

Bigvgan: A universal neural vocoder with large-scale training

3 Pith papers cite this work. Polarity classification is still indexing.

3 Pith papers citing it

verdicts

UNVERDICTED 3

representative citing papers

Latent Fourier Transform

cs.SD · 2026-04-20 · unverdicted · novelty 7.0

LatentFT uses latent-space Fourier transforms and frequency masking in diffusion autoencoders to enable timescale-specific manipulation of musical structure in generative models.

Kimi-Audio Technical Report

eess.AS · 2025-04-25 · unverdicted · novelty 5.0

Kimi-Audio is an open-source audio foundation model that achieves state-of-the-art results on speech recognition, audio understanding, question answering, and conversation after pre-training on more than 13 million hours of speech, sound, and music data.

Movie Gen: A Cast of Media Foundation Models

cs.CV · 2024-10-17 · unverdicted · novelty 5.0

A 30B-parameter transformer and related models generate high-quality videos and audio, claiming state-of-the-art results on text-to-video, video editing, personalization, and audio generation tasks.

citing papers explorer

Showing 3 of 3 citing papers.

  • Latent Fourier Transform cs.SD · 2026-04-20 · unverdicted · none · ref 23

    LatentFT uses latent-space Fourier transforms and frequency masking in diffusion autoencoders to enable timescale-specific manipulation of musical structure in generative models.

  • Kimi-Audio Technical Report eess.AS · 2025-04-25 · unverdicted · none · ref 38

    Kimi-Audio is an open-source audio foundation model that achieves state-of-the-art results on speech recognition, audio understanding, question answering, and conversation after pre-training on more than 13 million hours of speech, sound, and music data.

  • Movie Gen: A Cast of Media Foundation Models cs.CV · 2024-10-17 · unverdicted · none · ref 40

    A 30B-parameter transformer and related models generate high-quality videos and audio, claiming state-of-the-art results on text-to-video, video editing, personalization, and audio generation tasks.