Ensembits is the first tokenizer of protein conformational ensembles that outperforms static tokenizers on RMSF prediction and matches them on function and mutation tasks while using less pretraining data.
Soundstream: An end-to-end neural audio codec
7 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
roles
background 2polarities
background 2representative citing papers
CodecAttack perturbs audio in codec latent space with multi-bitrate EoT to achieve 85.5% average ASR on Opus-compressed Audio LLMs versus under 26% for waveform baselines, with transfer to MP3 and AAC.
AMAR uses a transformer with learnable query embeddings for set-based prediction of concurrent activities from composite Wi-Fi CSI, combined with edge feature extraction and vector quantization for bandwidth-efficient deployment.
A jointly learned hierarchical index with cross-attention and residual quantization scales exact retrieval in foundational recommendation models, deployed at Meta with additional performance from test-time training on index nodes.
FAST applies discrete cosine transform to robot action sequences for efficient tokenization, enabling autoregressive VLAs to succeed on high-frequency dexterous tasks and scale to 10k hours of data while matching diffusion VLA performance with up to 5x faster training.
Woosh is a new publicly released foundation model optimized for high-quality sound effect generation from text or video, showing competitive or better results than open alternatives like Stable Audio Open.
The paper introduces Musical Attention, an attention variant that incorporates eight musical features including metadata to generate more coherent and varied music than standard or strided attention baselines.
citing papers explorer
-
ENSEMBITS: an alphabet of protein conformational ensembles
Ensembits is the first tokenizer of protein conformational ensembles that outperforms static tokenizers on RMSF prediction and matches them on function and mutation tasks while using less pretraining data.
-
Codec-Robust Attacks on Audio LLMs
CodecAttack perturbs audio in codec latent space with multi-bitrate EoT to achieve 85.5% average ASR on Opus-compressed Audio LLMs versus under 26% for waveform baselines, with transfer to MP3 and AAC.
-
AMAR: Lightweight Attention-Based Multi-User Activity Recognition from Wi-Fi CSI
AMAR uses a transformer with learnable query embeddings for set-based prediction of concurrent activities from composite Wi-Fi CSI, combined with edge feature extraction and vector quantization for bandwidth-efficient deployment.
-
Efficient Retrieval Scaling with Hierarchical Indexing for Large Scale Recommendation
A jointly learned hierarchical index with cross-attention and residual quantization scales exact retrieval in foundational recommendation models, deployed at Meta with additional performance from test-time training on index nodes.
-
FAST: Efficient Action Tokenization for Vision-Language-Action Models
FAST applies discrete cosine transform to robot action sequences for efficient tokenization, enabling autoregressive VLAs to succeed on high-frequency dexterous tasks and scale to 10k hours of data while matching diffusion VLA performance with up to 5x faster training.
-
Woosh: A Sound Effects Foundation Model
Woosh is a new publicly released foundation model optimized for high-quality sound effect generation from text or video, showing competitive or better results than open alternatives like Stable Audio Open.
-
Musical Attention Transformer: Music Generation Using a Music-Specific Attention Model
The paper introduces Musical Attention, an attention variant that incorporates eight musical features including metadata to generate more coherent and varied music than standard or strided attention baselines.