pith. sign in

arxiv: 2606.04943 · v1 · pith:6HFLIEREnew · submitted 2026-06-03 · 📡 eess.AS · eess.SP

Differentiable Articulatory Copy-Synthesis of Biphonic Singing

classification 📡 eess.AS eess.SP
keywords articulatorysygytbaselinebiphoniccontrolcopy-synthesisdifferentiablemodel
0
0 comments X
read the original abstract

Sygyt is a Tuvan style of biphonic singing in which a low vocal drone is sustained while a high harmonic is selectively amplified in the 1--3\,kHz region. Copy-synthesizing this effect remains challenging for articulatory models, since it requires fine control of narrowly focused resonances that standard low-dimensional tract parameterizations cannot easily reproduce. We address this problem with a differentiable Kelly--Lochbaum waveguide augmented with a sublingual second source, cubic B-spline tract parameterization, and spatially varying learnable damping, optimized end-to-end by gradient descent from audio. On 20 segments from two independent sygyt datasets (5 singers, 10 pitches), the proposed model reduces log-spectral distance by 30--38\% relative to an articulatory baseline, with the largest gains concentrated in the overtone region. Cepstral-envelope analysis further shows more accurate recovery of the merged formant structure characteristic of sygyt production. The model also outperforms a DDSP harmonic-plus-noise baseline with direct per-harmonic spectral control, suggesting that explicit acoustic structure is a useful inductive bias for overtone-singing copy-synthesis.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.