HiFi-GAN: Generative Adversarial Networks for Efficient and High Fidelity Speech Synthesis

· 2020 · arXiv 2010.05646

6 Pith papers cite this work. Polarity classification is still indexing.

6 Pith papers citing it

read on arXiv browse 6 citing papers

citation-role summary

method 2

citation-polarity summary

use method 2

representative citing papers

Tibetan-TTS:Low-Resource Tibetan Speech Synthesis with Large Model Adaptation

cs.SD · 2026-05-04 · unverdicted · novelty 7.0

Large-model adaptation with Tibetan text handling produces natural speech from limited data, outperforming commercial systems.

Step-Audio 2 Technical Report

cs.CL · 2025-07-22 · unverdicted · novelty 6.0

Step-Audio 2 integrates a latent audio encoder, reasoning-centric reinforcement learning, and discrete audio token generation into language modeling to deliver state-of-the-art performance on audio understanding and conversational benchmarks.

Audio Deepfake Detection with Half-Truth Localisation Using Cross-Attentive Feature Fusion

cs.SD · 2026-05-28 · unverdicted · novelty 5.0

CAFNet performs joint ternary classification and temporal boundary regression for half-truth audio deepfakes via cross-attentive fusion of MFCC, LFCC, and Chroma-STFT features, reporting 92.71% accuracy and 0.075s MAE on MLADDC T2+T3.

Woosh: A Sound Effects Foundation Model

cs.SD · 2026-04-02 · accept · novelty 5.0

Woosh is a new publicly released foundation model optimized for high-quality sound effect generation from text or video, showing competitive or better results than open alternatives like Stable Audio Open.

LTX-2: Efficient Joint Audio-Visual Foundation Model

cs.CV · 2026-01-06 · conditional · novelty 5.0

LTX-2 generates high-quality synchronized audiovisual content from text prompts via an asymmetric 14B-video / 5B-audio dual-stream transformer with cross-attention and modality-aware guidance.

Few-Shot Synthetic Accented Speech for ASR Fine-Tuning: What Helps and When?

cs.SD · 2026-04-30

citing papers explorer

Showing 6 of 6 citing papers.

Tibetan-TTS:Low-Resource Tibetan Speech Synthesis with Large Model Adaptation cs.SD · 2026-05-04 · unverdicted · none · ref 16
Large-model adaptation with Tibetan text handling produces natural speech from limited data, outperforming commercial systems.
Step-Audio 2 Technical Report cs.CL · 2025-07-22 · unverdicted · none · ref 42
Step-Audio 2 integrates a latent audio encoder, reasoning-centric reinforcement learning, and discrete audio token generation into language modeling to deliver state-of-the-art performance on audio understanding and conversational benchmarks.
Audio Deepfake Detection with Half-Truth Localisation Using Cross-Attentive Feature Fusion cs.SD · 2026-05-28 · unverdicted · none · ref 1
CAFNet performs joint ternary classification and temporal boundary regression for half-truth audio deepfakes via cross-attentive fusion of MFCC, LFCC, and Chroma-STFT features, reporting 92.71% accuracy and 0.075s MAE on MLADDC T2+T3.
Woosh: A Sound Effects Foundation Model cs.SD · 2026-04-02 · accept · none · ref 20
Woosh is a new publicly released foundation model optimized for high-quality sound effect generation from text or video, showing competitive or better results than open alternatives like Stable Audio Open.
LTX-2: Efficient Joint Audio-Visual Foundation Model cs.CV · 2026-01-06 · conditional · none · ref 13
LTX-2 generates high-quality synchronized audiovisual content from text prompts via an asymmetric 14B-video / 5B-audio dual-stream transformer with cross-attention and modality-aware guidance.
Few-Shot Synthetic Accented Speech for ASR Fine-Tuning: What Helps and When? cs.SD · 2026-04-30 · unreviewed · ref 39

HiFi-GAN: Generative Adversarial Networks for Efficient and High Fidelity Speech Synthesis

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer