hub Tool reference

Speech Commands: A Dataset for Limited-Vocabulary Speech Recognition

Pete Warden · 2018 · cs.CL · arXiv 1804.03209

Tool reference. 83% of classified Pith citations use this work as a method, library, or software dependency, not as a substantive claim.

53 Pith papers citing it

Method reference 83% of classified citations

open full Pith review browse 53 citing papers arXiv PDF

abstract

Describes an audio dataset of spoken words designed to help train and evaluate keyword spotting systems. Discusses why this task is an interesting challenge, and why it requires a specialized dataset that is different from conventional datasets used for automatic speech recognition of full sentences. Suggests a methodology for reproducible and comparable accuracy metrics for this task. Describes how the data was collected and verified, what it contains, previous versions and properties. Concludes by reporting baseline results of models trained on this dataset.

hub tools

JSON dossier citing papers JSON arXiv source

citation-role summary

dataset 5 background 1

citation-polarity summary

use dataset 5 unclear 1

representative citing papers

DiffWave: A Versatile Diffusion Model for Audio Synthesis

eess.AS · 2020-09-21 · unverdicted · novelty 8.0

DiffWave is a non-autoregressive diffusion model that generates high-fidelity audio waveforms from noise in constant steps, matching WaveNet vocoder quality while being orders of magnitude faster and outperforming prior models in unconditional generation.

Mamba: Linear-Time Sequence Modeling with Selective State Spaces

cs.LG · 2023-12-01 · unverdicted · novelty 8.0

Mamba is a linear-time sequence model using input-dependent selective SSMs that achieves SOTA results across modalities and matches twice-larger Transformers on language modeling with 5x higher inference throughput.

Efficiently Modeling Long Sequences with Structured State Spaces

cs.LG · 2021-10-31 · unverdicted · novelty 8.0

S4 is an efficient state space sequence model that captures long-range dependencies via structured parameterization of the SSM, achieving state-of-the-art results on the Long Range Arena and other benchmarks while being faster than Transformers for generation.

LongSpike: Fractional Order Spiking State Space Models for Efficient Long Sequence Learning

cs.LG · 2026-06-11 · unverdicted · novelty 7.0

LongSpike integrates fractional-order state-space modeling into spiking neural networks, enabling better long-sequence performance than prior SNNs on LRA, WikiText-103, and Speech Commands benchmarks while retaining sparse computation.

Attention by Synchronization in Coupled Oscillator Networks

cs.LG · 2026-06-10 · unverdicted · novelty 7.0

Kuramoto synchronization dynamics implement a provably unique and globally attractive attention mechanism that replaces softmax for physical substrates and shows competitive empirical performance.

Totoro$^+$: An Adaptive and Scalable Edge Federated Learning System

cs.DC · 2026-05-25 · unverdicted · novelty 7.0

Totoro+ is a DHT-based fully decentralized FL system with locality-aware multi-ring P2P structure, pub/sub forest, and game-theoretic path planning that claims O(log N) hops and 1.2-14x speedup for many concurrent applications on edge nodes.

Covariance Estimation for Matrix-variate Data via Fixed-rank Core Covariance Geometry

math.DG · 2025-11-30 · unverdicted · novelty 7.0

The space of rank-r core covariances forms a smooth manifold except on a measure-zero set, enabling a partial-isotropy shrinkage estimator for matrix-variate data.

DASB - Discrete Audio and Speech Benchmark

cs.SD · 2024-06-20 · unverdicted · novelty 7.0

DASB is a new benchmark for discrete audio tokens showing semantic tokens outperform acoustic ones but discrete representations remain less robust than continuous features across domains.

Wake Vision: A Tailored Dataset and Benchmark Suite for TinyML Computer Vision Applications

cs.CV · 2024-05-01 · unverdicted · novelty 7.0

Wake Vision pipeline produces a 6M-image person detection dataset for TinyML with 2.2% label error, improving model accuracy up to 6.6% over prior VWW benchmark across architectures and subsets.

FiTS: Interpretable Spiking Neurons via Frequency Selectivity and Temporal Shaping

cs.NE · 2026-05-13 · unverdicted · novelty 7.0

FiTS spiking neurons improve auditory task performance over LIF baselines by factorizing computation into frequency selectivity and group-delay-based temporal shaping, yielding interpretable per-neuron parameters.

End-to-End Keyword Spotting on FPGA Using Graph Neural Networks with a Neuromorphic Auditory Sensor

cs.LG · 2026-05-10 · conditional · novelty 7.0

An FPGA implementation of a neuromorphic auditory sensor plus graph neural network achieves 87.43% accuracy on Google Speech Commands v2 with sub-35 µs latency and 1.12 W power.

MMEB-V3: Measuring the Performance Gaps of Omni-Modality Embedding Models

cs.IR · 2026-04-25 · unverdicted · novelty 7.0

MMEB-V3 benchmark shows omni-modality embedding models fail to enforce instruction-specified modality constraints and exhibit asymmetric, query-biased retrieval.

End-to-End Voice Intent Recognition for Spontaneous Human-Drone Interaction with Naive Users

eess.AS · 2026-06-19 · unverdicted · novelty 6.0

An end-to-end SLU architecture with frozen SSL acoustic encoder, LSTM classification head, and cross-modal distillation achieves 93% accuracy on simple commands and 82% on spontaneous speech at 7 ms latency on the new VoiceStick corpus, outperforming cascade baselines.

Exploiting Neural Audio Codec Latents for Adversarial Audio Attacks

cs.SD · 2026-06-18 · unverdicted · novelty 6.0

A conditional generator operating in neural audio codec latent space produces targeted adversarial audio examples in one forward pass, reaching up to 99% success rate at sub-7 ms inference.

Adaptive Speech-to-Spike Encoding for Spiking Neural Networks

cs.NE · 2026-06-17 · unverdicted · novelty 6.0

A learnable residual speech-to-spike encoder jointly trained with an R-LIF SNN achieves up to 94.97% accuracy on GSC-v2 with a 35k-parameter model and supports DFA credit assignment at 91.5%.

NeuralMUSIC: A Hybrid Neural-Subspace Framework for Robot Sound Source Localization

cs.SD · 2026-06-17 · unverdicted · novelty 6.0 · 2 refs

NeuralMUSIC combines neural covariance estimation with the MUSIC pipeline, frequency attention fusion, and self-supervised learning to improve direction-of-arrival estimation for robotic sound source localization.

Representation Matters in Randomized Smoothing for Audio Classification

eess.AS · 2026-06-02 · unverdicted · novelty 6.0

Randomized smoothing in audio classification requires explicit specification of the certified representation and preprocessing because different choices produce different certified accuracies and effective perturbation scales even at identical noise levels on keyword spotting and environmental sound

What changes after deployment? A survey on On-device Learning in TinyML

cs.LG · 2026-05-29 · unverdicted · novelty 6.0

A survey of on-device learning in TinyML organized by distribution change regimes, highlighting influences on applications, hardware, and solutions plus a gap between benchmarks and deployments.

Plug-in Losses for Evidential Deep Learning: A Simplified Framework for Uncertainty Estimation that Includes the Softmax Classifier

cs.LG · 2026-05-21 · unverdicted · novelty 6.0

Plug-in losses approximate EDL training objectives at the Dirichlet mean with decaying error as evidence grows, including softmax under a specific mapping, and match classical EDL performance on Google Speech Commands.

AudioMosaic: Contrastive Masked Audio Representation Learning

cs.LG · 2026-05-14 · unverdicted · novelty 6.0

AudioMosaic learns general-purpose audio representations through contrastive pre-training with structured spectrogram masking, reaching state-of-the-art results on standard benchmarks and improving audio-language tasks.

ComMark: Covert and Robust Black-Box Model Watermarking with Compressed Samples

cs.CR · 2025-12-16 · unverdicted · novelty 6.0

ComMark embeds covert watermarks in models using frequency-domain compressed samples and simulated attacks, claiming state-of-the-art covertness and robustness across image, speech, text, and video tasks.

AaSP: Aliasing-aware Self-Supervised Pre-Training for Audio Spectrogram Transformers

cs.SD · 2025-12-03 · unverdicted · novelty 6.0

AaSP learns aliasing-stable audio representations by augmenting patch tokens with adaptive subband features from alias-prone bands and using teacher-student masked modeling plus multi-mask contrastive regularization, reaching SOTA on AS-20K, ESC-50, and NSynth under fine-tuning.

SiLIF: Structured State Space Model Dynamics and Parametrization for Spiking Neural Networks

cs.NE · 2025-06-04 · unverdicted · novelty 6.0

SiLIF models apply SSM dynamics and parametrization to spiking neurons for stable training, reaching new SOTA on event-based and raw-audio speech datasets while using half the compute of SSMs via synaptic delays.

Simplified State Space Layers for Sequence Modeling

cs.LG · 2022-08-09 · accept · novelty 6.0

S5 uses a single MIMO state space model with S4-derived initialization to match S4 efficiency and reach 87.4% average accuracy on the Long Range Arena benchmark.

citing papers explorer

Showing 1 of 1 citing paper after filters.

DRL-CLBA: A Clean Label Backdoor Attack for Speech Classification via DDPG Reinforcement Learning cs.AI · 2026-07-02 · unverdicted · none · ref 39 · internal anchor
DRL-CLBA applies DDPG reinforcement learning and deep audio steganography to create sample-specific clean-label backdoor attacks on speech DNNs that resist fine-tuning, pruning, and spectral signature defenses.

Speech Commands: A Dataset for Limited-Vocabulary Speech Recognition

hub tools

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer