hub

Advances in neural information processing systems , volume=

wav2vec 2

11 Pith papers cite this work. Polarity classification is still indexing.

11 Pith papers citing it

browse 11 citing papers

hub tools

JSON dossier citing papers JSON

representative citing papers

AffectCodec: Emotion-Preserving Neural Speech Codec for Expressive Speech Modeling

cs.SD · 2026-05-11 · unverdicted · novelty 7.0

AffectCodec is an emotion-guided neural speech codec that preserves emotional cues during quantization while maintaining semantic fidelity and prosodic naturalness.

NeuralBench: A Unifying Framework to Benchmark NeuroAI Models

cs.LG · 2026-05-08 · conditional · novelty 7.0

NeuralBench is a new benchmarking framework for neuroAI models on EEG data that finds foundation models only marginally outperform task-specific ones while many tasks like cognitive decoding stay highly challenging.

A foundation model of vision, audition, and language for in-silico neuroscience

q-bio.NC · 2026-05-05 · unverdicted · novelty 7.0

TRIBE v2 is a multimodal AI model that predicts human brain activity more accurately than linear encoding models and recovers established neuroscientific findings through in-silico testing.

ATIR: Towards Audio-Text Interleaved Contextual Retrieval

cs.SD · 2026-04-22 · unverdicted · novelty 7.0

Defines ATIR task and benchmark for mixed audio-text queries; MLLM model with token compression shows substantial gains over strong baselines.

Indic-CodecFake meets SATYAM: Towards Detecting Neural Audio Codec Synthesized Speech Deepfakes in Indic Languages

eess.AS · 2026-04-21 · unverdicted · novelty 7.0

Introduces the Indic-CodecFake dataset for Indic codec deepfakes and SATYAM, a novel hyperbolic ALM that outperforms baselines through dual-stage semantic-prosodic fusion using Bhattacharya distance.

Prosody as Supervision: Bridging the Non-Verbal--Verbal for Multilingual Speech Emotion Recognition

eess.AS · 2026-04-19 · unverdicted · novelty 7.0

NOVA-ARC is a hyperbolic geometry framework that transfers emotion supervision from labeled non-verbal vocalizations to unlabeled verbal speech in multiple languages via optimal transport prototype alignment and consistency regularization.

Uncovering the Latent Potential of Deep Intermediate Representations

cs.LG · 2026-05-21 · unverdicted · novelty 6.0

Introduces LOES, a constructive spectral method to select task-discriminative subspaces from intermediate layer embeddings, and GeoReg for enforcing simplicial class geometry during fine-tuning, with reported gains increasing with model depth across modalities.

Virtual Speech Therapist: A Clinician-in-the-Loop AI Speech Therapy Agent for Personalized and Supervised Therapy

cs.AI · 2026-05-01 · conditional · novelty 6.0

VST combines deep-learning stuttering classification with multi-agent LLM reasoning to produce clinician-reviewed, evidence-based therapy plans for stuttering.

HCFD: A Benchmark for Audio Deepfake Detection in Healthcare

eess.AS · 2026-04-19 · unverdicted · novelty 6.0

HCFD is a new pathology-aware benchmark and dataset for codec-fake audio detection in healthcare, with PHOENIX-Mamba achieving up to 97% accuracy by modeling fakes as modes in hyperbolic space.

Structured Diffusion Bridges: Inductive Bias for Denoising Diffusion Bridges

cs.LG · 2026-05-03 · unverdicted · novelty 5.0 · 2 refs

A structured diffusion bridge method achieves near fully-paired modality translation quality using alignment constraints even in unpaired or semi-paired regimes.

Multilingual and Multimodal LLMs in the Wild: Building for Low-Resource Languages

cs.CL · 2026-05-16 · unverdicted · novelty 2.0

A tutorial synthesizing foundations, recent models such as PALO and Maya, and low-cost methods for tri-modal multilingual AI in resource-constrained settings.

citing papers explorer

Showing 11 of 11 citing papers.

AffectCodec: Emotion-Preserving Neural Speech Codec for Expressive Speech Modeling cs.SD · 2026-05-11 · unverdicted · none · ref 38
AffectCodec is an emotion-guided neural speech codec that preserves emotional cues during quantization while maintaining semantic fidelity and prosodic naturalness.
NeuralBench: A Unifying Framework to Benchmark NeuroAI Models cs.LG · 2026-05-08 · conditional · none · ref 195
NeuralBench is a new benchmarking framework for neuroAI models on EEG data that finds foundation models only marginally outperform task-specific ones while many tasks like cognitive decoding stay highly challenging.
A foundation model of vision, audition, and language for in-silico neuroscience q-bio.NC · 2026-05-05 · unverdicted · none · ref 117
TRIBE v2 is a multimodal AI model that predicts human brain activity more accurately than linear encoding models and recovers established neuroscientific findings through in-silico testing.
ATIR: Towards Audio-Text Interleaved Contextual Retrieval cs.SD · 2026-04-22 · unverdicted · none · ref 16
Defines ATIR task and benchmark for mixed audio-text queries; MLLM model with token compression shows substantial gains over strong baselines.
Indic-CodecFake meets SATYAM: Towards Detecting Neural Audio Codec Synthesized Speech Deepfakes in Indic Languages eess.AS · 2026-04-21 · unverdicted · none · ref 53
Introduces the Indic-CodecFake dataset for Indic codec deepfakes and SATYAM, a novel hyperbolic ALM that outperforms baselines through dual-stage semantic-prosodic fusion using Bhattacharya distance.
Prosody as Supervision: Bridging the Non-Verbal--Verbal for Multilingual Speech Emotion Recognition eess.AS · 2026-04-19 · unverdicted · none · ref 28
NOVA-ARC is a hyperbolic geometry framework that transfers emotion supervision from labeled non-verbal vocalizations to unlabeled verbal speech in multiple languages via optimal transport prototype alignment and consistency regularization.
Uncovering the Latent Potential of Deep Intermediate Representations cs.LG · 2026-05-21 · unverdicted · none · ref 65
Introduces LOES, a constructive spectral method to select task-discriminative subspaces from intermediate layer embeddings, and GeoReg for enforcing simplicial class geometry during fine-tuning, with reported gains increasing with model depth across modalities.
Virtual Speech Therapist: A Clinician-in-the-Loop AI Speech Therapy Agent for Personalized and Supervised Therapy cs.AI · 2026-05-01 · conditional · none · ref 21
VST combines deep-learning stuttering classification with multi-agent LLM reasoning to produce clinician-reviewed, evidence-based therapy plans for stuttering.
HCFD: A Benchmark for Audio Deepfake Detection in Healthcare eess.AS · 2026-04-19 · unverdicted · none · ref 44
HCFD is a new pathology-aware benchmark and dataset for codec-fake audio detection in healthcare, with PHOENIX-Mamba achieving up to 97% accuracy by modeling fakes as modes in hyperbolic space.
Structured Diffusion Bridges: Inductive Bias for Denoising Diffusion Bridges cs.LG · 2026-05-03 · unverdicted · none · ref 57 · 2 links
A structured diffusion bridge method achieves near fully-paired modality translation quality using alignment constraints even in unpaired or semi-paired regimes.
Multilingual and Multimodal LLMs in the Wild: Building for Low-Resource Languages cs.CL · 2026-05-16 · unverdicted · none · ref 236
A tutorial synthesizing foundations, recent models such as PALO and Maya, and low-cost methods for tri-modal multilingual AI in resource-constrained settings.

Advances in neural information processing systems , volume=

hub tools

fields

years

verdicts

representative citing papers

citing papers explorer