BASTION is a budget-aware speculative decoding framework with adaptive tree-structured block diffusion drafting that reports up to 6.61x speedup and 39% improvement over block-diffusion baselines.
hub
Non-Autoregressive Neural Machine Translation
14 Pith papers cite this work. Polarity classification is still indexing.
abstract
Existing approaches to neural machine translation condition each output word on previously generated outputs. We introduce a model that avoids this autoregressive property and produces its outputs in parallel, allowing an order of magnitude lower latency during inference. Through knowledge distillation, the use of input token fertilities as a latent variable, and policy gradient fine-tuning, we achieve this at a cost of as little as 2.0 BLEU points relative to the autoregressive Transformer network used as a teacher. We demonstrate substantial cumulative improvements associated with each of the three aspects of our training strategy, and validate our approach on IWSLT 2016 English-German and two WMT language pairs. By sampling fertilities in parallel at inference time, our non-autoregressive model achieves near-state-of-the-art performance of 29.8 BLEU on WMT 2016 English-Romanian.
hub tools
citation-role summary
citation-polarity summary
representative citing papers
Discrete Stochastic Localization lets a single trained network support an entire family of per-token SNR paths for discrete sequence generation, with masked diffusion as a special case, and improves MAUVE scores when fine-tuning pretrained checkpoints.
Massive activations are constant large values in LLMs that function as indispensable bias terms and concentrate attention probabilities on specific tokens.
HapticLDM is the first latent diffusion model that generates vibrotactile signals directly from text, using dynamic text curation and global denoising to improve realism and semantic alignment over autoregressive baselines.
PlayGen-MoG uses a shared Mixture-of-Gaussians head across agents plus relative attention to generate diverse coordinated plays from a single static formation, achieving 1.68 yard ADE and 3.98 yard FDE with full mixture utilization on football data.
A post-hoc framework using fertility and entropy from word alignments on reference translations shows context redistributes responsibility to context tokens for function words but not content words across three language pairs.
Continuous flows on token embeddings with flow-map distillation produce one-step language models whose quality exceeds recent 8-step discrete diffusion baselines on LM1B and OpenWebText.
Reinforce-NAT and FS-decoder retrieve target sequential information for non-autoregressive translation, yielding higher BLEU than baseline NAT while preserving fast decoding and approaching autoregressive quality.
BitLM replaces per-token softmax with bitwise continuous diffusion inside causal blocks to generate multiple tokens in parallel while preserving autoregressive structure.
Reasoning language models extract answers from sparse, order-shuffled chain-of-thought traces with little accuracy loss.
VaaWIT proposes DSAM and VAA modules to adapt LLMs for multilingual web image translation, claiming outperformance over open-source baselines on benchmarks.
The paper proposes CDCD, a continuous-time and continuous-space diffusion framework for categorical data, and reports results on language modeling tasks.
Transformer and Memory Fusion Network attention mechanisms generalize to multimodal time-series emotion recognition on emotional autobiographical narratives, achieving performance comparable to human raters in some cases.
SBSG model generates sequences bidirectionally from ends to middle via interactive attention, claiming faster decoding and better quality than autoregressive Transformer on NMT and summarization tasks.
citing papers explorer
-
Bastion: Budget-Aware Speculative Decoding with Tree-structured Block Diffusion Drafting
BASTION is a budget-aware speculative decoding framework with adaptive tree-structured block diffusion drafting that reports up to 6.61x speedup and 39% improvement over block-diffusion baselines.
-
Discrete Stochastic Localization for Non-autoregressive Generation
Discrete Stochastic Localization lets a single trained network support an entire family of per-token SNR paths for discrete sequence generation, with masked diffusion as a special case, and improves MAUVE scores when fine-tuning pretrained checkpoints.
-
Massive Activations in Large Language Models
Massive activations are constant large values in LLMs that function as indispensable bias terms and concentrate attention probabilities on specific tokens.
-
HapticLDM: A Diffusion Model for Text-to-Vibrotactile Generation
HapticLDM is the first latent diffusion model that generates vibrotactile signals directly from text, using dynamic text curation and global denoising to improve realism and semantic alignment over autoregressive baselines.
-
PlayGen-MoG: Framework for Diverse Multi-Agent Play Generation via Mixture-of-Gaussians Trajectory Prediction
PlayGen-MoG uses a shared Mixture-of-Gaussians head across agents plus relative attention to generate diverse coordinated plays from a single static formation, achieving 1.68 yard ADE and 3.98 yard FDE with full mixture utilization on football data.
-
Which Tokens Need Context? A Reference-Based Analysis of Translation Responsibility Using Fertility and Entropy
A post-hoc framework using fertility and entropy from word alignments on reference translations shows context redistributes responsibility to context tokens for function words but not content words across three language pairs.
-
Flow Map Language Models: One-step Language Modeling via Continuous Denoising
Continuous flows on token embeddings with flow-map distillation produce one-step language models whose quality exceeds recent 8-step discrete diffusion baselines on LM1B and OpenWebText.
-
Retrieving Sequential Information for Non-Autoregressive Neural Machine Translation
Reinforce-NAT and FS-decoder retrieve target sequential information for non-autoregressive translation, yielding higher BLEU than baseline NAT while preserving fast decoding and approaching autoregressive quality.
-
BitLM: Unlocking Multi-Token Language Generation with Bitwise Continuous Diffusion
BitLM replaces per-token softmax with bitwise continuous diffusion inside causal blocks to generate multiple tokens in parallel while preserving autoregressive structure.
-
Rethinking Dense Sequential Chains: Reasoning Language Models Can Extract Answers from Sparse, Order-Shuffling Chain-of-Thoughts
Reasoning language models extract answers from sparse, order-shuffled chain-of-thought traces with little accuracy loss.
-
VaaWIT: Visual-Aware Adaptation of Large Language Models for Multilingual Web Image Translation
VaaWIT proposes DSAM and VAA modules to adapt LLMs for multilingual web image translation, claiming outperformance over open-source baselines on benchmarks.
-
Continuous diffusion for categorical data
The paper proposes CDCD, a continuous-time and continuous-space diffusion framework for categorical data, and reports results on language modeling tasks.
-
Attending to Emotional Narratives
Transformer and Memory Fusion Network attention mechanisms generalize to multimodal time-series emotion recognition on emotional autobiographical narratives, achieving performance comparable to human raters in some cases.
-
Sequence Generation: From Both Sides to the Middle
SBSG model generates sequences bidirectionally from ends to middle via interactive attention, claiming faster decoding and better quality than autoregressive Transformer on NMT and summarization tasks.