pith. sign in

hub Tool reference

Speech Commands: A Dataset for Limited-Vocabulary Speech Recognition

Tool reference. 83% of classified Pith citations use this work as a method, library, or software dependency, not as a substantive claim.

53 Pith papers citing it
Method reference 83% of classified citations
abstract

Describes an audio dataset of spoken words designed to help train and evaluate keyword spotting systems. Discusses why this task is an interesting challenge, and why it requires a specialized dataset that is different from conventional datasets used for automatic speech recognition of full sentences. Suggests a methodology for reproducible and comparable accuracy metrics for this task. Describes how the data was collected and verified, what it contains, previous versions and properties. Concludes by reporting baseline results of models trained on this dataset.

hub tools

citation-role summary

dataset 5 background 1

citation-polarity summary

clear filters

representative citing papers

DiffWave: A Versatile Diffusion Model for Audio Synthesis

eess.AS · 2020-09-21 · unverdicted · novelty 8.0

DiffWave is a non-autoregressive diffusion model that generates high-fidelity audio waveforms from noise in constant steps, matching WaveNet vocoder quality while being orders of magnitude faster and outperforming prior models in unconditional generation.

Mamba: Linear-Time Sequence Modeling with Selective State Spaces

cs.LG · 2023-12-01 · unverdicted · novelty 8.0

Mamba is a linear-time sequence model using input-dependent selective SSMs that achieves SOTA results across modalities and matches twice-larger Transformers on language modeling with 5x higher inference throughput.

Efficiently Modeling Long Sequences with Structured State Spaces

cs.LG · 2021-10-31 · unverdicted · novelty 8.0

S4 is an efficient state space sequence model that captures long-range dependencies via structured parameterization of the SSM, achieving state-of-the-art results on the Long Range Arena and other benchmarks while being faster than Transformers for generation.

Attention by Synchronization in Coupled Oscillator Networks

cs.LG · 2026-06-10 · unverdicted · novelty 7.0

Kuramoto synchronization dynamics implement a provably unique and globally attractive attention mechanism that replaces softmax for physical substrates and shows competitive empirical performance.

Totoro$^+$: An Adaptive and Scalable Edge Federated Learning System

cs.DC · 2026-05-25 · unverdicted · novelty 7.0

Totoro+ is a DHT-based fully decentralized FL system with locality-aware multi-ring P2P structure, pub/sub forest, and game-theoretic path planning that claims O(log N) hops and 1.2-14x speedup for many concurrent applications on edge nodes.

DASB - Discrete Audio and Speech Benchmark

cs.SD · 2024-06-20 · unverdicted · novelty 7.0

DASB is a new benchmark for discrete audio tokens showing semantic tokens outperform acoustic ones but discrete representations remain less robust than continuous features across domains.

Adaptive Speech-to-Spike Encoding for Spiking Neural Networks

cs.NE · 2026-06-17 · unverdicted · novelty 6.0

A learnable residual speech-to-spike encoder jointly trained with an R-LIF SNN achieves up to 94.97% accuracy on GSC-v2 with a 35k-parameter model and supports DFA credit assignment at 91.5%.

Representation Matters in Randomized Smoothing for Audio Classification

eess.AS · 2026-06-02 · unverdicted · novelty 6.0

Randomized smoothing in audio classification requires explicit specification of the certified representation and preprocessing because different choices produce different certified accuracies and effective perturbation scales even at identical noise levels on keyword spotting and environmental sound

AudioMosaic: Contrastive Masked Audio Representation Learning

cs.LG · 2026-05-14 · unverdicted · novelty 6.0

AudioMosaic learns general-purpose audio representations through contrastive pre-training with structured spectrogram masking, reaching state-of-the-art results on standard benchmarks and improving audio-language tasks.

Simplified State Space Layers for Sequence Modeling

cs.LG · 2022-08-09 · accept · novelty 6.0

S5 uses a single MIMO state space model with S4-derived initialization to match S4 efficiency and reach 87.4% average accuracy on the Long Range Arena benchmark.

citing papers explorer

Showing 1 of 1 citing paper after filters.