The Spectral Sensitivity Theorem identifies a phase transition in Whisper models where scaling causes self-attention to collapse into rank-1 attractors, decoupling output from acoustic evidence.
Accentbox: Towards high-fidelity zero-shot accent generation
4 Pith papers cite this work. Polarity classification is still indexing.
years
2026 4representative citing papers
NeuralLVC achieves better lossless compression than H.264 and H.265 on video sequences by combining masked diffusion with temporal conditioning on frame differences.
Clear2Fog generates realistic synthetic fog from clear scenes, enabling mixed-density training that outperforms full fixed-density data and improves real-world performance by 1.67 mAP after learning-rate adjustment.
Few-shot TTS adaptation combined with LLM-guided phoneme editing produces synthetic accented speech that improves ASR word error rates on real accented audio even in cross-speaker and ultra-low-data settings.
citing papers explorer
-
From Dispersion to Attraction: Spectral Dynamics of Hallucination Across Whisper Model Scales
The Spectral Sensitivity Theorem identifies a phase transition in Whisper models where scaling causes self-attention to collapse into rank-1 attractors, decoupling output from acoustic evidence.
-
NeuralLVC: Neural Lossless Video Compression via Masked Diffusion with Temporal Conditioning
NeuralLVC achieves better lossless compression than H.264 and H.265 on video sequences by combining masked diffusion with temporal conditioning on frame differences.
-
A Data Efficiency Study of Synthetic Fog for Object Detection Using the Clear2Fog Pipeline
Clear2Fog generates realistic synthetic fog from clear scenes, enabling mixed-density training that outperforms full fixed-density data and improves real-world performance by 1.67 mAP after learning-rate adjustment.
-
Few-Shot Accent Synthesis for ASR with LLM-Guided Phoneme Editing
Few-shot TTS adaptation combined with LLM-guided phoneme editing produces synthetic accented speech that improves ASR word error rates on real accented audio even in cross-speaker and ultra-low-data settings.