AffectCodec is an emotion-guided neural speech codec that preserves emotional cues during quantization while maintaining semantic fidelity and prosodic naturalness.
hub
Advances in neural information processing systems , volume=
11 Pith papers cite this work. Polarity classification is still indexing.
hub tools
years
2026 11representative citing papers
NeuralBench is a new benchmarking framework for neuroAI models on EEG data that finds foundation models only marginally outperform task-specific ones while many tasks like cognitive decoding stay highly challenging.
TRIBE v2 is a multimodal AI model that predicts human brain activity more accurately than linear encoding models and recovers established neuroscientific findings through in-silico testing.
Defines ATIR task and benchmark for mixed audio-text queries; MLLM model with token compression shows substantial gains over strong baselines.
Introduces the Indic-CodecFake dataset for Indic codec deepfakes and SATYAM, a novel hyperbolic ALM that outperforms baselines through dual-stage semantic-prosodic fusion using Bhattacharya distance.
NOVA-ARC is a hyperbolic geometry framework that transfers emotion supervision from labeled non-verbal vocalizations to unlabeled verbal speech in multiple languages via optimal transport prototype alignment and consistency regularization.
Introduces LOES, a constructive spectral method to select task-discriminative subspaces from intermediate layer embeddings, and GeoReg for enforcing simplicial class geometry during fine-tuning, with reported gains increasing with model depth across modalities.
VST combines deep-learning stuttering classification with multi-agent LLM reasoning to produce clinician-reviewed, evidence-based therapy plans for stuttering.
HCFD is a new pathology-aware benchmark and dataset for codec-fake audio detection in healthcare, with PHOENIX-Mamba achieving up to 97% accuracy by modeling fakes as modes in hyperbolic space.
A structured diffusion bridge method achieves near fully-paired modality translation quality using alignment constraints even in unpaired or semi-paired regimes.
A tutorial synthesizing foundations, recent models such as PALO and Maya, and low-cost methods for tri-modal multilingual AI in resource-constrained settings.
citing papers explorer
-
AffectCodec: Emotion-Preserving Neural Speech Codec for Expressive Speech Modeling
AffectCodec is an emotion-guided neural speech codec that preserves emotional cues during quantization while maintaining semantic fidelity and prosodic naturalness.
-
NeuralBench: A Unifying Framework to Benchmark NeuroAI Models
NeuralBench is a new benchmarking framework for neuroAI models on EEG data that finds foundation models only marginally outperform task-specific ones while many tasks like cognitive decoding stay highly challenging.
-
A foundation model of vision, audition, and language for in-silico neuroscience
TRIBE v2 is a multimodal AI model that predicts human brain activity more accurately than linear encoding models and recovers established neuroscientific findings through in-silico testing.
-
ATIR: Towards Audio-Text Interleaved Contextual Retrieval
Defines ATIR task and benchmark for mixed audio-text queries; MLLM model with token compression shows substantial gains over strong baselines.
-
Indic-CodecFake meets SATYAM: Towards Detecting Neural Audio Codec Synthesized Speech Deepfakes in Indic Languages
Introduces the Indic-CodecFake dataset for Indic codec deepfakes and SATYAM, a novel hyperbolic ALM that outperforms baselines through dual-stage semantic-prosodic fusion using Bhattacharya distance.
-
Prosody as Supervision: Bridging the Non-Verbal--Verbal for Multilingual Speech Emotion Recognition
NOVA-ARC is a hyperbolic geometry framework that transfers emotion supervision from labeled non-verbal vocalizations to unlabeled verbal speech in multiple languages via optimal transport prototype alignment and consistency regularization.
-
Uncovering the Latent Potential of Deep Intermediate Representations
Introduces LOES, a constructive spectral method to select task-discriminative subspaces from intermediate layer embeddings, and GeoReg for enforcing simplicial class geometry during fine-tuning, with reported gains increasing with model depth across modalities.
-
Virtual Speech Therapist: A Clinician-in-the-Loop AI Speech Therapy Agent for Personalized and Supervised Therapy
VST combines deep-learning stuttering classification with multi-agent LLM reasoning to produce clinician-reviewed, evidence-based therapy plans for stuttering.
-
HCFD: A Benchmark for Audio Deepfake Detection in Healthcare
HCFD is a new pathology-aware benchmark and dataset for codec-fake audio detection in healthcare, with PHOENIX-Mamba achieving up to 97% accuracy by modeling fakes as modes in hyperbolic space.
-
Structured Diffusion Bridges: Inductive Bias for Denoising Diffusion Bridges
A structured diffusion bridge method achieves near fully-paired modality translation quality using alignment constraints even in unpaired or semi-paired regimes.
-
Multilingual and Multimodal LLMs in the Wild: Building for Low-Resource Languages
A tutorial synthesizing foundations, recent models such as PALO and Maya, and low-cost methods for tri-modal multilingual AI in resource-constrained settings.