Smoothie: Smoothing Diffusion on Token Embeddings for Text Generation

Alexander Shabalin; Dmitry Vetrov; Viacheslav Meshchaninov

arxiv: 2505.18853 · v2 · pith:IJ6JFWG7new · submitted 2025-05-24 · 💻 cs.CL

Smoothie: Smoothing Diffusion on Token Embeddings for Text Generation

Alexander Shabalin , Viacheslav Meshchaninov , Dmitry Vetrov This is my paper

Pith reviewed 2026-05-22 01:29 UTC · model grok-4.3

classification 💻 cs.CL

keywords diffusion modelstext generationtoken embeddingssemantic smoothingsequence-to-sequenceunconditional generationdiscrete diffusion

0 comments

The pith

Smoothing token embeddings by semantic similarity improves diffusion text generation quality.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper proposes Smoothie, a diffusion method that progressively smooths token embeddings according to semantic similarity between tokens. This is meant to combine the semantic structure available in continuous latent spaces with the discreteness respected by categorical simplex approaches. The goal is gradual information removal during the diffusion process while keeping a natural path back to discrete tokens for decoding. Experiments on sequence-to-sequence and unconditional generation tasks show Smoothie producing higher quality output than prior diffusion methods for text. Ablation results indicate the smoothing-based space itself outperforms both plain embedding space and categorical simplex alternatives.

Core claim

Smoothie applies progressive smoothing to token embeddings based on semantic similarity to create a diffusion process that removes information gradually while supporting natural decoding. This bridges the gap between Gaussian diffusion in continuous spaces, which keeps semantic relations but complicates token recovery, and categorical simplex diffusion, which stays discrete but ignores token similarities. On multiple sequence-to-sequence and unconditional text generation benchmarks, Smoothie yields better generation quality than existing diffusion-based models. Ablation studies confirm that the proposed diffusion space outperforms both the standard embedding space and the categorical simplex

What carries the argument

Progressive smoothing of token embeddings based on semantic similarity, which defines the diffusion space and controls how information is removed step by step.

If this is right

Higher generation quality than prior diffusion models on sequence-to-sequence tasks.
Higher generation quality than prior diffusion models on unconditional text generation tasks.
The smoothing-based diffusion space outperforms the standard embedding space.
The smoothing-based diffusion space outperforms the categorical simplex space.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same smoothing idea could be tested on other discrete sequences such as code or structured data where token similarities matter.
The method may allow diffusion models to use fewer steps for sampling if the gradual semantic smoothing speeds convergence.
Combining Smoothie with non-diffusion text models might reduce the usual quality gap between diffusion and autoregressive approaches.

Load-bearing premise

That progressively smoothing token embeddings based on semantic similarity enables gradual information removal while maintaining a natural decoding process without introducing artifacts that degrade final generation quality.

What would settle it

If replacing the semantic-similarity smoothing with uniform or random smoothing in the same experimental setup produces equal or higher quality scores on the reported generation tasks.

Figures

Figures reproduced from arXiv: 2505.18853 by Alexander Shabalin, Dmitry Vetrov, Viacheslav Meshchaninov.

**Figure 2.** Figure 2: Unconditional generation quality for δ = 1 and varying ˜δ. Before presenting results on seq-to-seq generation tasks, we highlight the importance of the hyperparameter ˜δ, which controls the stochasticity of the denoising process. To illustrate its impact, we evaluate generation quality on an unconditional generation task using different values of ˜δ. Specifically, we use the ROCStories dataset and assess … view at source ↗

read the original abstract

Diffusion models have achieved state-of-the-art performance in generating images, audio, and video, but their adaptation to text remains challenging due to its discrete nature. Prior approaches either apply Gaussian diffusion in continuous latent spaces, which inherits semantic structure but struggles with token decoding, or operate in categorical simplex space, which respect discreteness but disregard semantic relation between tokens. In this paper, we propose Smoothing Diffusion on Token Embeddings (Smoothie), a novel diffusion method that combines the strengths of both approaches by progressively smoothing token embeddings based on semantic similarity. This technique enables gradual information removal while maintaining a natural decoding process. Experimental results on several sequence-to-sequence and unconditional generation tasks demonstrate that Smoothie outperforms existing diffusion-based models in generation quality. Furthermore, ablation studies show that our proposed diffusion space yields better performance than both the standard embedding space and the categorical simplex. The code is available at https://github.com/ashaba1in/smoothie.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Smoothie gives a workable new diffusion space for text by smoothing embeddings to semantic neighbors, with experiments that hold up.

read the letter

Smoothie proposes smoothing token embeddings using semantic similarity for the diffusion process in text generation. This approach tries to get the best of both worlds from continuous latent diffusion and categorical methods. The new part is the forward diffusion that replaces each embedding with a mix of itself and its k nearest neighbors in the embedding space. This creates a gradual smoothing based on meaning rather than just adding noise or moving in simplex. The reverse is a network that denoises back to the original embedding, and final tokens come from nearest neighbor search. That setup avoids some decoding issues while keeping semantic relations. The paper shows this works better than previous diffusion models on sequence-to-sequence and unconditional tasks. Ablation studies compare directly to using raw embeddings and the categorical simplex, with consistent improvements in BLEU, perplexity, and MAUVE scores. They re-ran baselines with matched resources, which makes the comparison fairer. The math looks straightforward: convex combinations for the noising step, standard denoising objective. No circular definitions or unfalsifiable claims. The full manuscript backs up the claims without obvious flaws. A minor concern is whether the reported gains are large enough to matter in practice or if they depend heavily on the specific embedding model used. The paper could have included more runs to show stability, but the controls they have are reasonable. This work is for researchers focused on adapting diffusion to discrete data like text. Someone looking for a new way to handle token generation in diffusion models would find the method and results useful. It has enough grounding in the experiments to warrant a full review. I recommend putting it through peer review rather than desk rejecting it. The idea is novel enough and the evidence is presented clearly enough to be worth referee time.

Referee Report

1 major / 3 minor

Summary. The paper proposes Smoothie, a diffusion model for text generation that performs progressive smoothing of token embeddings based on semantic similarity (via convex combinations with k-nearest neighbors in embedding space). The forward process gradually removes information while respecting semantic relations, the reverse process trains a standard denoising network to recover the original embedding, and final decoding uses nearest-neighbor lookup. Experiments on sequence-to-sequence and unconditional generation tasks report that Smoothie outperforms prior diffusion-based text models, with ablations demonstrating that the proposed smoothing space yields better results than both raw embedding space and the categorical simplex.

Significance. If the reported gains hold under the described controls, the work offers a practical bridge between continuous semantic latent spaces and discrete token constraints in diffusion models for text. The direct ablation comparisons to raw embeddings and the simplex, use of standard metrics (BLEU, perplexity, MAUVE), re-implementation of baselines under matched compute, and public code release are positive features that would strengthen the contribution if the quantitative improvements are robust.

major comments (1)

[§4] §4 (Experiments): the central claim of consistent outperformance rests on the ablation tables and main results; the manuscript should explicitly report effect sizes, standard deviations across runs, and statistical significance tests for the BLEU/perplexity/MAUVE gains to confirm they are not attributable to variance or implementation differences.

minor comments (3)

[§3.1] §3.1: the exact schedule for the smoothing parameter (how k or the convex weight evolves with diffusion timestep) could be stated more explicitly, perhaps with a short pseudocode block, to aid reproducibility.
[Figure 2] Figure 2 caption: clarify whether the visualized trajectories are from the forward or reverse process and label the axes consistently with the embedding-space notation used in the text.
[§2] Related work section: the discussion of prior continuous-latent diffusion methods could include a brief comparison table of their decoding strategies versus the nearest-neighbor lookup used here.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the detailed and constructive review. We address the major comment on the experimental reporting below and will incorporate the requested statistical details in the revised manuscript.

read point-by-point responses

Referee: [§4] §4 (Experiments): the central claim of consistent outperformance rests on the ablation tables and main results; the manuscript should explicitly report effect sizes, standard deviations across runs, and statistical significance tests for the BLEU/perplexity/MAUVE gains to confirm they are not attributable to variance or implementation differences.

Authors: We agree that including standard deviations, effect sizes, and statistical significance tests would strengthen the presentation of our results. In the revised version, we will rerun the main experiments and key ablations with multiple random seeds (at least 3–5 runs per setting) and report means with standard deviations. We will also add effect sizes (e.g., Cohen’s d or mean differences with 95% confidence intervals) and perform paired statistical significance tests (e.g., t-tests or Wilcoxon tests) between Smoothie and the strongest baselines, marking results that reach p < 0.05. These additions will be placed in the main results tables and ablation tables in §4. revision: yes

Circularity Check

0 steps flagged

No significant circularity in derivation chain

full rationale

The paper explicitly defines the forward diffusion process as a convex combination of token embeddings with their k-nearest neighbors in embedding space based on semantic similarity. The reverse process employs a standard denoising network trained to predict the original embedding, and final decoding uses nearest-neighbor lookup. Ablation studies directly compare performance against raw embedding space and categorical simplex without any reduction of the claimed gains to a fitted parameter or self-referential definition. No load-bearing self-citations, uniqueness theorems imported from prior author work, or ansatzes smuggled via citation are present in the provided description of the method. The experimental results on BLEU, perplexity, and MAUVE metrics with re-implemented baselines under matched compute provide independent support rather than circular construction.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The method rests on the domain assumption that semantic similarity in embedding space provides a meaningful axis for gradual noise addition; no free parameters or invented entities are mentioned in the abstract.

axioms (1)

domain assumption Token embeddings encode semantic similarity that can be used to define a smoothing process
Invoked when the paper states that smoothing is performed based on semantic similarity between tokens.

pith-pipeline@v0.9.0 · 5691 in / 1127 out tokens · 52795 ms · 2026-05-22T01:29:57.920512+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

We represent each token wyi ... with a vector of negative squared Euclidean distances ... Dt = Dt/σt² + δε ... pt = softmax(Dt) ... generalization of simplex-based diffusion
IndisputableMonolith/Foundation/AlexanderDuality.lean alexander_duality_circle_linking unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

Our choice of the Euclidean distance is based on the Euclidean semantic space hypothesis

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Not All Denoising Steps Are Equal: Model Scheduling for Faster Masked Diffusion Language Models
cs.LG 2026-02 unverdicted novelty 7.0

Early and late denoising steps in masked diffusion LMs are robust to smaller-model replacement, enabling 17% FLOPs reduction with modest generative quality loss.

Reference graph

Works this paper leans on

54 extracted references · 54 canonical work pages · cited by 1 Pith paper · 4 internal anchors

[1]

volume 4, pages 401–415,

Optimizing statistical machine translation for text simplification. volume 4, pages 401–415,

work page
[2]

URLhttps://www.aclweb.org/anthology/Q16-1029

work page
[3]

Johnson, Jonathan Ho, Daniel Tarlow, and Rianne van den Berg

Jacob Austin, Daniel D. Johnson, Jonathan Ho, Daniel Tarlow, and Rianne van den Berg. Structured denoising diffusion models in discrete state-spaces. In M. Ranzato, A. Beygelzimer, Y . Dauphin, P.S. Liang, and J. Wortman Vaughan, editors,Advances in Neural Information Processing Systems, volume 34, pages 17981–17993. Curran Associates, Inc., 2021. URL htt...

work page 2021
[4]

Stable video diffusion: Scaling latent video diffusion models to large datasets, 2023

Andreas Blattmann, Tim Dockhorn, Sumith Kulal, Daniel Mendelevitch, Maciej Kilian, Do- minik Lorenz, Yam Levi, Zion English, Vikram V oleti, Adam Letts, Varun Jampani, and Robin Rombach. Stable video diffusion: Scaling latent video diffusion models to large datasets, 2023

work page 2023
[5]

Brown, Dawn Song, Úlfar Er- lingsson, Alina Oprea, and Colin Raffel

Nicholas Carlini, Florian Tramer, Eric Wallace, Matthew Jagielski, Ariel Herbert-V oss, Katherine Lee, Adam Roberts, Tom Brown, Dawn Song, Ulfar Erlingsson, Alina Oprea, and Colin Raffel. Extracting training data from large language models, 2021. URL https://arxiv.org/abs/2012.07805

work page arXiv 2021
[6]

Analog bits: Generating discrete data using diffusion models with self-conditioning, 2023

Ting Chen, Ruixiang Zhang, and Geoffrey Hinton. Analog bits: Generating discrete data using diffusion models with self-conditioning, 2023

work page 2023
[7]

Quora question pairs

Zihang Chen, Hongbo Zhang, Xiaoji Zhang, and Leqi Zhao. Quora question pairs. 2017. URL https://api.semanticscholar.org/CorpusID:233225749

work page 2017
[8]

Scaling instruction-finetuned language models.Journal of Machine Learning Research, 25(70):1–53, 2024

Hyung Won Chung, Le Hou, Shayne Longpre, Barret Zoph, Yi Tay, William Fedus, Yunxuan Li, Xuezhi Wang, Mostafa Dehghani, Siddhartha Brahma, et al. Scaling instruction-finetuned language models.Journal of Machine Learning Research, 25(70):1–53, 2024

work page 2024
[9]

Schwing, and David Forsyth

Aditya Deshpande, Jyoti Aneja, Liwei Wang, Alexander G. Schwing, and David Forsyth. Fast, diverse and accurate image captioning guided by part-of-speech. In2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 10687–10696, 2019. doi: 10.1109/CVPR.2019.01095

work page doi:10.1109/cvpr.2019.01095 2019
[10]

BERT : Pre-training of Deep Bidirectional Transformers for Language Understanding

Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. BERT: Pre-training of deep bidirectional transformers for language understanding. In Jill Burstein, Christy Doran, and Thamar Solorio, editors,Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, V ol...

work page doi:10.18653/v1/n19-1423 2019
[11]

Quasar: Datasets for Question Answering by Search and Reading

Bhuwan Dhingra, Kathryn Mazaitis, and William W Cohen. Quasar: Datasets for question answering by search and reading.arXiv preprint arXiv:1707.03904, 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017
[12]

Diverse text generation via variational encoder-decoder models with gaussian process priors, 2022

Wanyu Du, Jianqiao Zhao, Liwei Wang, and Yangfeng Ji. Diverse text generation via variational encoder-decoder models with gaussian process priors, 2022. URL https://arxiv.org/abs/ 2204.01227

work page arXiv 2022
[13]

Hawley, and Jordi Pons

Zach Evans, CJ Carr, Josiah Taylor, Scott H. Hawley, and Jordi Pons. Fast timing-conditioned latent audio diffusion, 2024

work page 2024
[14]

Diffuseq: Sequence to sequence text generation with diffusion models

Shansan Gong, Mukai Li, Jiangtao Feng, Zhiyong Wu, and Lingpeng Kong. Diffuseq: Sequence to sequence text generation with diffusion models. InThe Eleventh International Conference on Learning Representations, 2023. URL https://openreview.net/forum?id=jQj-_ rLVXsj. 10

work page 2023
[15]

DiffuSeq-v2: Bridging discrete and continuous text spaces for accelerated Seq2Seq diffusion models

Shansan Gong, Mukai Li, Jiangtao Feng, Zhiyong Wu, and Lingpeng Kong. DiffuSeq-v2: Bridging discrete and continuous text spaces for accelerated Seq2Seq diffusion models. In Houda Bouamor, Juan Pino, and Kalika Bali, editors,Findings of the Association for Computational Linguistics: EMNLP 2023, pages 9868–9875, Singapore, December 2023. Association for Com...

work page doi:10.18653/v1/2023.findings-emnlp.660 2023
[16]

Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio

Ian J. Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. Generative adversarial nets. In Z. Ghahramani, M. Welling, C. Cortes, N. Lawrence, and K.Q. Weinberger, editors, Advances in Neural Information Processing Systems, volume 27. Curran Associates, Inc., 2014. URL https://proceed...

work page 2014
[17]

Yifan Li, Yifan Du, Kun Zhou, Jinpeng Wang, Xin Zhao, and Ji-Rong Wen

Xiaochuang Han, Sachin Kumar, and Yulia Tsvetkov. SSD-LM: Semi-autoregressive simplex- based diffusion language model for text generation and modular control. In Anna Rogers, Jordan Boyd-Graber, and Naoaki Okazaki, editors,Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (V olume 1: Long Papers), pages 11575–11596, T...

work page doi:10.18653/v1/2023 2023
[18]

Hashimoto, David Alvarez-Melis, and Tommi S

Tatsunori B. Hashimoto, David Alvarez-Melis, and Tommi S. Jaakkola. Word embeddings as metric recovery in semantic spaces.Transactions of the Association for Computational Linguistics, 4:273–286, 2016. doi: 10.1162/tacl_a_00098. URL https://aclanthology. org/Q16-1020/

work page doi:10.1162/tacl_a_00098 2016
[19]

DiffusionBERT: Improving generative masked language models with diffusion models

Zhengfu He, Tianxiang Sun, Qiong Tang, Kuanning Wang, Xuanjing Huang, and Xipeng Qiu. DiffusionBERT: Improving generative masked language models with diffusion models. In Anna Rogers, Jordan Boyd-Graber, and Naoaki Okazaki, editors,Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (V olume 1: Long Papers), pages 4521–...

work page 2023
[20]

Classifier-free diffusion guidance

Jonathan Ho and Tim Salimans. Classifier-free diffusion guidance. InNeurIPS 2021 Workshop on Deep Generative Models and Downstream Applications, 2021. URL https://openreview. net/forum?id=qw8AKxfYbI

work page 2021
[21]

Denoising diffusion probabilistic models

Jonathan Ho, Ajay Jain, and Pieter Abbeel. Denoising diffusion probabilistic models. In H. Larochelle, M. Ranzato, R. Hadsell, M.F. Balcan, and H. Lin, editors,Advances in Neural Information Processing Systems, volume 33, pages 6840–6851. Curran Associates, Inc., 2020

work page 2020
[22]

Blurring diffusion models

Emiel Hoogeboom and Tim Salimans. Blurring diffusion models. InThe Eleventh International Conference on Learning Representations, 2023. URL https://openreview.net/forum? id=OjDkC57x5sz

work page 2023
[23]

Argmax flows and multinomial diffusion: Learning categorical distributions

Emiel Hoogeboom, Didrik Nielsen, Priyank Jaini, Patrick Forré, and Max Welling. Argmax flows and multinomial diffusion: Learning categorical distributions. In M. Ranzato, A. Beygelzimer, Y . Dauphin, P.S. Liang, and J. Wortman Vaughan, editors,Advances in Neural Information Processing Systems, volume 34, pages 12454–12465. Curran Associates, Inc., 2021. U...

work page 2021
[24]

A survey on hallucination in large language models: Principles, taxonomy, challenges, and open questions,

Lei Huang, Weijiang Yu, Weitao Ma, Weihong Zhong, Zhangyin Feng, Haotian Wang, Qian- glong Chen, Weihua Peng, Xiaocheng Feng, Bing Qin, and Ting Liu. A survey on hallu- cination in large language models: Principles, taxonomy, challenges, and open questions. ACM Transactions on Information Systems, 43(2):1–55, January 2025. ISSN 1558-2868. doi: 10.1145/370...

work page doi:10.1145/3703155 2025
[25]

Neural crf model for sentence alignment in text simplification

Chao Jiang, Mounica Maddela, Wuwei Lan, Yang Zhong, and Wei Xu. Neural crf model for sentence alignment in text simplification. InProceedings of the Association for Computational Linguistics (ACL), 2020. 11

work page 2020
[26]

TESS: Text-to-text self-conditioned simplex diffusion

Rabeeh Karimi Mahabadi, Hamish Ivison, Jaesung Tae, James Henderson, Iz Beltagy, Matthew Peters, and Arman Cohan. TESS: Text-to-text self-conditioned simplex diffusion. In Yvette Graham and Matthew Purver, editors,Proceedings of the 18th Conference of the European Chapter of the Association for Computational Linguistics (V olume 1: Long Papers), pages 234...

work page 2024
[27]

BART: Denoising sequence-to-sequence pre- training for natural language generation, translation, and comprehension

Mike Lewis, Yinhan Liu, Naman Goyal, Marjan Ghazvininejad, Abdelrahman Mohamed, Omer Levy, Veselin Stoyanov, and Luke Zettlemoyer. BART: Denoising sequence-to-sequence pre- training for natural language generation, translation, and comprehension. In Dan Jurafsky, Joyce Chai, Natalie Schluter, and Joel Tetreault, editors,Proceedings of the 58th Annual Meet...

work page
[28]

doi: 10.18653/v1/2020.acl-main.703

Association for Computational Linguistics. doi: 10.18653/v1/2020.acl-main.703. URL https://aclanthology.org/2020.acl-main.703

work page doi:10.18653/v1/2020.acl-main.703 2020
[29]

Diffusion-lm improves controllable text generation

Xiang Li, John Thickstun, Ishaan Gulrajani, Percy S Liang, and Tatsunori B Hashimoto. Diffusion-lm improves controllable text generation. In S. Koyejo, S. Mohamed, A. Agarwal, D. Belgrave, K. Cho, and A. Oh, editors,Advances in Neural Infor- mation Processing Systems, volume 35, pages 4328–4343. Curran Associates, Inc.,

work page
[30]

URL https://proceedings.neurips.cc/paper_files/paper/2022/file/ 1be5bc25d50895ee656b8c2d9eb89d6a-Paper-Conference.pdf

work page 2022
[31]

Rouge: A package for automatic evaluation of summaries

Chin-Yew Lin. Rouge: A package for automatic evaluation of summaries. InText summarization branches out, pages 74–81, 2004

work page 2004
[32]

Text generation with diffusion language models: a pre-training approach with continuous paragraph denoise

Zhenghao Lin, Yeyun Gong, Yelong Shen, Tong Wu, Zhihao Fan, Chen Lin, Nan Duan, and Weizhu Chen. Text generation with diffusion language models: a pre-training approach with continuous paragraph denoise. InProceedings of the 40th International Conference on Machine Learning, ICML’23. JMLR.org, 2023

work page 2023
[33]

Roberta: A robustly optimized bert pretraining approach, 2019

Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Mandar Joshi, Danqi Chen, Omer Levy, Mike Lewis, Luke Zettlemoyer, and Veselin Stoyanov. Roberta: A robustly optimized bert pretraining approach, 2019

work page 2019
[34]

Discrete Diffusion Modeling by Estimating the Ratios of the Data Distribution

Aaron Lou, Chenlin Meng, and Stefano Ermon. Discrete diffusion modeling by estimating the ratios of the data distribution, 2024. URLhttps://arxiv.org/abs/2310.16834

work page internal anchor Pith review Pith/arXiv arXiv 2024
[35]

Latent diffusion for language generation

Justin Lovelace, Varsha Kishore, Chao Wan, Eliot Shekhtman, and Kilian Q Wein- berger. Latent diffusion for language generation. In A. Oh, T. Naumann, A. Glober- son, K. Saenko, M. Hardt, and S. Levine, editors,Advances in Neural Informa- tion Processing Systems, volume 36, pages 56998–57025. Curran Associates, Inc.,

work page
[36]

URL https://proceedings.neurips.cc/paper_files/paper/2023/file/ b2a2bd5d5051ff6af52e1ef60aefd255-Paper-Conference.pdf

work page 2023
[37]

A corpus and cloze evaluation for deeper understanding of commonsense stories

Nasrin Mostafazadeh, Nathanael Chambers, Xiaodong He, Devi Parikh, Dhruv Batra, Lucy Vanderwende, Pushmeet Kohli, and James Allen. A corpus and cloze evaluation for deeper understanding of commonsense stories. InProceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Tech- nologies,...

work page 2016
[38]

Cohen, and Mirella Lapata

Shashi Narayan, Shay B. Cohen, and Mirella Lapata. Don’t give me the details, just the summary! topic-aware convolutional neural networks for extreme summarization. In Ellen Riloff, David Chiang, Julia Hockenmaier, and Jun’ichi Tsujii, editors,Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pages 1797–1807, Brussels...

work page doi:10.18653/v1/d18-1206 2018
[39]

doi:10.3115/1073083.1073135 , editor =

Kishore Papineni, Salim Roukos, Todd Ward, and Wei-Jing Zhu. Bleu: a method for automatic evaluation of machine translation. In Pierre Isabelle, Eugene Charniak, and Dekang Lin, editors, Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics, pages 311–318, Philadelphia, Pennsylvania, USA, July 2002. Association for Comput...

work page doi:10.3115/1073083.1073135 2002
[40]

Mauve: Measuring the gap between neural text and human text using divergence frontiers

Krishna Pillutla, Swabha Swayamdipta, Rowan Zellers, John Thickstun, Sean Welleck, Yejin Choi, and Zaid Harchaoui. Mauve: Measuring the gap between neural text and human text using divergence frontiers. In M. Ranzato, A. Beygelzimer, Y . Dauphin, P.S. Liang, and J. Wortman Vaughan, editors,Advances in Neural Information Processing Systems, volume 34, pages 4816–

work page
[41]

URL https://proceedings.neurips.cc/paper_ files/paper/2021/file/260c2432a0eecc28ce03c10dadc078a4-Paper.pdf

Curran Associates, Inc., 2021. URL https://proceedings.neurips.cc/paper_ files/paper/2021/file/260c2432a0eecc28ce03c10dadc078a4-Paper.pdf

work page 2021
[42]

SDXL: Improving Latent Diffusion Models for High-Resolution Image Synthesis

Dustin Podell, Zion English, Kyle Lacey, Andreas Blattmann, Tim Dockhorn, Jonas Müller, Joe Penna, and Robin Rombach. Sdxl: Improving latent diffusion models for high-resolution image synthesis, 2023. URLhttps://arxiv.org/abs/2307.01952

work page internal anchor Pith review Pith/arXiv arXiv 2023
[43]

Language models are unsupervised multitask learners

Alec Radford, Jeff Wu, Rewon Child, David Luan, Dario Amodei, and Ilya Sutskever. Language models are unsupervised multitask learners. 2019. URL https://api.semanticscholar. org/CorpusID:160025533

work page 2019
[44]

Variational inference with normalizing flows

Danilo Jimenez Rezende and Shakir Mohamed. Variational inference with normalizing flows. InProceedings of the 32nd International Conference on International Conference on Machine Learning - V olume 37, ICML’15, page 1530–1538. JMLR.org, 2015

work page 2015
[45]

Generative modelling with inverse heat dissipation

Severi Rissanen, Markus Heinonen, and Arno Solin. Generative modelling with inverse heat dissipation. InThe Eleventh International Conference on Learning Representations, 2023. URL https://openreview.net/forum?id=4PJUBT9f2Ol

work page 2023
[46]

High- resolution image synthesis with latent diffusion models

Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser, and Björn Ommer. High- resolution image synthesis with latent diffusion models. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 10684–10695, 2022

work page 2022
[47]

Tencdm: Understanding the properties of the diffusion model in the space of language model encodings,

Alexander Shabalin, Viacheslav Meshchaninov, Egor Chimbulatov, Vladislav Lapikov, Roman Kim, Grigory Bartosh, Dmitry Molchanov, Sergey Markov, and Dmitry Vetrov. Tencdm: Understanding the properties of the diffusion model in the space of language model encodings,

work page
[48]

URLhttps://arxiv.org/abs/2402.19097

work page arXiv
[49]

The spread of low-credibility content by social bots.Nature Communications, 9(1), November 2018

Chengcheng Shao, Giovanni Luca Ciampaglia, Onur Varol, Kai-Cheng Yang, Alessandro Flammini, and Filippo Menczer. The spread of low-credibility content by social bots.Nature Communications, 9(1), November 2018. ISSN 2041-1723. doi: 10.1038/s41467-018-06930-7. URLhttp://dx.doi.org/10.1038/s41467-018-06930-7

work page doi:10.1038/s41467-018-06930-7 2018
[50]

Attention is all you need.Advances in neural information processing systems, 30, 2017

Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. Attention is all you need.Advances in neural information processing systems, 30, 2017

work page 2017
[51]

Taxonomy of risks posed by language models

Laura Weidinger, Jonathan Uesato, Maribeth Rauh, Conor Griffin, Po-Sen Huang, John Mellor, Amelia Glaese, Myra Cheng, Borja Balle, Atoosa Kasirzadeh, Courtney Biles, Sasha Brown, Zac Kenton, Will Hawkins, Tom Stepleton, Abeba Birhane, Lisa Anne Hendricks, Laura Rimell, William Isaac, Julia Haas, Sean Legassick, Geoffrey Irving, and Iason Gabriel. Taxonomy...

work page doi:10.1145/3531146.3533088 2022
[52]

AR-diffusion: Auto-regressive diffusion model for text generation

Tong Wu, Zhihao Fan, Xiao Liu, Hai-Tao Zheng, Yeyun Gong, yelong shen, Jian Jiao, Juntao Li, zhongyu wei, Jian Guo, Nan Duan, and Weizhu Chen. AR-diffusion: Auto-regressive diffusion model for text generation. InThirty-seventh Conference on Neural Information Processing Systems, 2023. URLhttps://openreview.net/forum?id=0EG6qUQ4xE

work page 2023
[53]

Seqdiffuseq: Text diffusion with encoder-decoder transformers.ArXiv, abs/2212.10325, 2022

Hongyi Yuan, Zheng Yuan, Chuanqi Tan, Fei Huang, and Songfang Huang. Seqdiffuseq: Text diffusion with encoder-decoder transformers.ArXiv, abs/2212.10325, 2022

work page arXiv 2022
[54]

BERTScore: Evaluating Text Generation with BERT

Tianyi Zhang, Varsha Kishore, Felix Wu, Kilian Q. Weinberger, and Yoav Artzi. Bertscore: Evaluating text generation with bert, 2020. URLhttps://arxiv.org/abs/1904.09675. 13 A Limitations Pre-trained EmbeddingsOur proposed method relies on a pre-trained embedding matrix E from the BERT model. While this choice simplifies the training process and improves i...

work page internal anchor Pith review Pith/arXiv arXiv 2020

[1] [1]

volume 4, pages 401–415,

Optimizing statistical machine translation for text simplification. volume 4, pages 401–415,

work page

[2] [2]

URLhttps://www.aclweb.org/anthology/Q16-1029

work page

[3] [3]

Johnson, Jonathan Ho, Daniel Tarlow, and Rianne van den Berg

Jacob Austin, Daniel D. Johnson, Jonathan Ho, Daniel Tarlow, and Rianne van den Berg. Structured denoising diffusion models in discrete state-spaces. In M. Ranzato, A. Beygelzimer, Y . Dauphin, P.S. Liang, and J. Wortman Vaughan, editors,Advances in Neural Information Processing Systems, volume 34, pages 17981–17993. Curran Associates, Inc., 2021. URL htt...

work page 2021

[4] [4]

Stable video diffusion: Scaling latent video diffusion models to large datasets, 2023

Andreas Blattmann, Tim Dockhorn, Sumith Kulal, Daniel Mendelevitch, Maciej Kilian, Do- minik Lorenz, Yam Levi, Zion English, Vikram V oleti, Adam Letts, Varun Jampani, and Robin Rombach. Stable video diffusion: Scaling latent video diffusion models to large datasets, 2023

work page 2023

[5] [5]

Brown, Dawn Song, Úlfar Er- lingsson, Alina Oprea, and Colin Raffel

Nicholas Carlini, Florian Tramer, Eric Wallace, Matthew Jagielski, Ariel Herbert-V oss, Katherine Lee, Adam Roberts, Tom Brown, Dawn Song, Ulfar Erlingsson, Alina Oprea, and Colin Raffel. Extracting training data from large language models, 2021. URL https://arxiv.org/abs/2012.07805

work page arXiv 2021

[6] [6]

Analog bits: Generating discrete data using diffusion models with self-conditioning, 2023

Ting Chen, Ruixiang Zhang, and Geoffrey Hinton. Analog bits: Generating discrete data using diffusion models with self-conditioning, 2023

work page 2023

[7] [7]

Quora question pairs

Zihang Chen, Hongbo Zhang, Xiaoji Zhang, and Leqi Zhao. Quora question pairs. 2017. URL https://api.semanticscholar.org/CorpusID:233225749

work page 2017

[8] [8]

Scaling instruction-finetuned language models.Journal of Machine Learning Research, 25(70):1–53, 2024

Hyung Won Chung, Le Hou, Shayne Longpre, Barret Zoph, Yi Tay, William Fedus, Yunxuan Li, Xuezhi Wang, Mostafa Dehghani, Siddhartha Brahma, et al. Scaling instruction-finetuned language models.Journal of Machine Learning Research, 25(70):1–53, 2024

work page 2024

[9] [9]

Schwing, and David Forsyth

Aditya Deshpande, Jyoti Aneja, Liwei Wang, Alexander G. Schwing, and David Forsyth. Fast, diverse and accurate image captioning guided by part-of-speech. In2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 10687–10696, 2019. doi: 10.1109/CVPR.2019.01095

work page doi:10.1109/cvpr.2019.01095 2019

[10] [10]

BERT : Pre-training of Deep Bidirectional Transformers for Language Understanding

Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. BERT: Pre-training of deep bidirectional transformers for language understanding. In Jill Burstein, Christy Doran, and Thamar Solorio, editors,Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, V ol...

work page doi:10.18653/v1/n19-1423 2019

[11] [11]

Quasar: Datasets for Question Answering by Search and Reading

Bhuwan Dhingra, Kathryn Mazaitis, and William W Cohen. Quasar: Datasets for question answering by search and reading.arXiv preprint arXiv:1707.03904, 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017

[12] [12]

Diverse text generation via variational encoder-decoder models with gaussian process priors, 2022

Wanyu Du, Jianqiao Zhao, Liwei Wang, and Yangfeng Ji. Diverse text generation via variational encoder-decoder models with gaussian process priors, 2022. URL https://arxiv.org/abs/ 2204.01227

work page arXiv 2022

[13] [13]

Hawley, and Jordi Pons

Zach Evans, CJ Carr, Josiah Taylor, Scott H. Hawley, and Jordi Pons. Fast timing-conditioned latent audio diffusion, 2024

work page 2024

[14] [14]

Diffuseq: Sequence to sequence text generation with diffusion models

Shansan Gong, Mukai Li, Jiangtao Feng, Zhiyong Wu, and Lingpeng Kong. Diffuseq: Sequence to sequence text generation with diffusion models. InThe Eleventh International Conference on Learning Representations, 2023. URL https://openreview.net/forum?id=jQj-_ rLVXsj. 10

work page 2023

[15] [15]

DiffuSeq-v2: Bridging discrete and continuous text spaces for accelerated Seq2Seq diffusion models

Shansan Gong, Mukai Li, Jiangtao Feng, Zhiyong Wu, and Lingpeng Kong. DiffuSeq-v2: Bridging discrete and continuous text spaces for accelerated Seq2Seq diffusion models. In Houda Bouamor, Juan Pino, and Kalika Bali, editors,Findings of the Association for Computational Linguistics: EMNLP 2023, pages 9868–9875, Singapore, December 2023. Association for Com...

work page doi:10.18653/v1/2023.findings-emnlp.660 2023

[16] [16]

Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio

Ian J. Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. Generative adversarial nets. In Z. Ghahramani, M. Welling, C. Cortes, N. Lawrence, and K.Q. Weinberger, editors, Advances in Neural Information Processing Systems, volume 27. Curran Associates, Inc., 2014. URL https://proceed...

work page 2014

[17] [17]

Yifan Li, Yifan Du, Kun Zhou, Jinpeng Wang, Xin Zhao, and Ji-Rong Wen

Xiaochuang Han, Sachin Kumar, and Yulia Tsvetkov. SSD-LM: Semi-autoregressive simplex- based diffusion language model for text generation and modular control. In Anna Rogers, Jordan Boyd-Graber, and Naoaki Okazaki, editors,Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (V olume 1: Long Papers), pages 11575–11596, T...

work page doi:10.18653/v1/2023 2023

[18] [18]

Hashimoto, David Alvarez-Melis, and Tommi S

Tatsunori B. Hashimoto, David Alvarez-Melis, and Tommi S. Jaakkola. Word embeddings as metric recovery in semantic spaces.Transactions of the Association for Computational Linguistics, 4:273–286, 2016. doi: 10.1162/tacl_a_00098. URL https://aclanthology. org/Q16-1020/

work page doi:10.1162/tacl_a_00098 2016

[19] [19]

DiffusionBERT: Improving generative masked language models with diffusion models

Zhengfu He, Tianxiang Sun, Qiong Tang, Kuanning Wang, Xuanjing Huang, and Xipeng Qiu. DiffusionBERT: Improving generative masked language models with diffusion models. In Anna Rogers, Jordan Boyd-Graber, and Naoaki Okazaki, editors,Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (V olume 1: Long Papers), pages 4521–...

work page 2023

[20] [20]

Classifier-free diffusion guidance

Jonathan Ho and Tim Salimans. Classifier-free diffusion guidance. InNeurIPS 2021 Workshop on Deep Generative Models and Downstream Applications, 2021. URL https://openreview. net/forum?id=qw8AKxfYbI

work page 2021

[21] [21]

Denoising diffusion probabilistic models

Jonathan Ho, Ajay Jain, and Pieter Abbeel. Denoising diffusion probabilistic models. In H. Larochelle, M. Ranzato, R. Hadsell, M.F. Balcan, and H. Lin, editors,Advances in Neural Information Processing Systems, volume 33, pages 6840–6851. Curran Associates, Inc., 2020

work page 2020

[22] [22]

Blurring diffusion models

Emiel Hoogeboom and Tim Salimans. Blurring diffusion models. InThe Eleventh International Conference on Learning Representations, 2023. URL https://openreview.net/forum? id=OjDkC57x5sz

work page 2023

[23] [23]

Argmax flows and multinomial diffusion: Learning categorical distributions

Emiel Hoogeboom, Didrik Nielsen, Priyank Jaini, Patrick Forré, and Max Welling. Argmax flows and multinomial diffusion: Learning categorical distributions. In M. Ranzato, A. Beygelzimer, Y . Dauphin, P.S. Liang, and J. Wortman Vaughan, editors,Advances in Neural Information Processing Systems, volume 34, pages 12454–12465. Curran Associates, Inc., 2021. U...

work page 2021

[24] [24]

A survey on hallucination in large language models: Principles, taxonomy, challenges, and open questions,

Lei Huang, Weijiang Yu, Weitao Ma, Weihong Zhong, Zhangyin Feng, Haotian Wang, Qian- glong Chen, Weihua Peng, Xiaocheng Feng, Bing Qin, and Ting Liu. A survey on hallu- cination in large language models: Principles, taxonomy, challenges, and open questions. ACM Transactions on Information Systems, 43(2):1–55, January 2025. ISSN 1558-2868. doi: 10.1145/370...

work page doi:10.1145/3703155 2025

[25] [25]

Neural crf model for sentence alignment in text simplification

Chao Jiang, Mounica Maddela, Wuwei Lan, Yang Zhong, and Wei Xu. Neural crf model for sentence alignment in text simplification. InProceedings of the Association for Computational Linguistics (ACL), 2020. 11

work page 2020

[26] [26]

TESS: Text-to-text self-conditioned simplex diffusion

Rabeeh Karimi Mahabadi, Hamish Ivison, Jaesung Tae, James Henderson, Iz Beltagy, Matthew Peters, and Arman Cohan. TESS: Text-to-text self-conditioned simplex diffusion. In Yvette Graham and Matthew Purver, editors,Proceedings of the 18th Conference of the European Chapter of the Association for Computational Linguistics (V olume 1: Long Papers), pages 234...

work page 2024

[27] [27]

BART: Denoising sequence-to-sequence pre- training for natural language generation, translation, and comprehension

Mike Lewis, Yinhan Liu, Naman Goyal, Marjan Ghazvininejad, Abdelrahman Mohamed, Omer Levy, Veselin Stoyanov, and Luke Zettlemoyer. BART: Denoising sequence-to-sequence pre- training for natural language generation, translation, and comprehension. In Dan Jurafsky, Joyce Chai, Natalie Schluter, and Joel Tetreault, editors,Proceedings of the 58th Annual Meet...

work page

[28] [28]

doi: 10.18653/v1/2020.acl-main.703

Association for Computational Linguistics. doi: 10.18653/v1/2020.acl-main.703. URL https://aclanthology.org/2020.acl-main.703

work page doi:10.18653/v1/2020.acl-main.703 2020

[29] [29]

Diffusion-lm improves controllable text generation

Xiang Li, John Thickstun, Ishaan Gulrajani, Percy S Liang, and Tatsunori B Hashimoto. Diffusion-lm improves controllable text generation. In S. Koyejo, S. Mohamed, A. Agarwal, D. Belgrave, K. Cho, and A. Oh, editors,Advances in Neural Infor- mation Processing Systems, volume 35, pages 4328–4343. Curran Associates, Inc.,

work page

[30] [30]

URL https://proceedings.neurips.cc/paper_files/paper/2022/file/ 1be5bc25d50895ee656b8c2d9eb89d6a-Paper-Conference.pdf

work page 2022

[31] [31]

Rouge: A package for automatic evaluation of summaries

Chin-Yew Lin. Rouge: A package for automatic evaluation of summaries. InText summarization branches out, pages 74–81, 2004

work page 2004

[32] [32]

Text generation with diffusion language models: a pre-training approach with continuous paragraph denoise

Zhenghao Lin, Yeyun Gong, Yelong Shen, Tong Wu, Zhihao Fan, Chen Lin, Nan Duan, and Weizhu Chen. Text generation with diffusion language models: a pre-training approach with continuous paragraph denoise. InProceedings of the 40th International Conference on Machine Learning, ICML’23. JMLR.org, 2023

work page 2023

[33] [33]

Roberta: A robustly optimized bert pretraining approach, 2019

Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Mandar Joshi, Danqi Chen, Omer Levy, Mike Lewis, Luke Zettlemoyer, and Veselin Stoyanov. Roberta: A robustly optimized bert pretraining approach, 2019

work page 2019

[34] [34]

Discrete Diffusion Modeling by Estimating the Ratios of the Data Distribution

Aaron Lou, Chenlin Meng, and Stefano Ermon. Discrete diffusion modeling by estimating the ratios of the data distribution, 2024. URLhttps://arxiv.org/abs/2310.16834

work page internal anchor Pith review Pith/arXiv arXiv 2024

[35] [35]

Latent diffusion for language generation

Justin Lovelace, Varsha Kishore, Chao Wan, Eliot Shekhtman, and Kilian Q Wein- berger. Latent diffusion for language generation. In A. Oh, T. Naumann, A. Glober- son, K. Saenko, M. Hardt, and S. Levine, editors,Advances in Neural Informa- tion Processing Systems, volume 36, pages 56998–57025. Curran Associates, Inc.,

work page

[36] [36]

URL https://proceedings.neurips.cc/paper_files/paper/2023/file/ b2a2bd5d5051ff6af52e1ef60aefd255-Paper-Conference.pdf

work page 2023

[37] [37]

A corpus and cloze evaluation for deeper understanding of commonsense stories

Nasrin Mostafazadeh, Nathanael Chambers, Xiaodong He, Devi Parikh, Dhruv Batra, Lucy Vanderwende, Pushmeet Kohli, and James Allen. A corpus and cloze evaluation for deeper understanding of commonsense stories. InProceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Tech- nologies,...

work page 2016

[38] [38]

Cohen, and Mirella Lapata

Shashi Narayan, Shay B. Cohen, and Mirella Lapata. Don’t give me the details, just the summary! topic-aware convolutional neural networks for extreme summarization. In Ellen Riloff, David Chiang, Julia Hockenmaier, and Jun’ichi Tsujii, editors,Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pages 1797–1807, Brussels...

work page doi:10.18653/v1/d18-1206 2018

[39] [39]

doi:10.3115/1073083.1073135 , editor =

Kishore Papineni, Salim Roukos, Todd Ward, and Wei-Jing Zhu. Bleu: a method for automatic evaluation of machine translation. In Pierre Isabelle, Eugene Charniak, and Dekang Lin, editors, Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics, pages 311–318, Philadelphia, Pennsylvania, USA, July 2002. Association for Comput...

work page doi:10.3115/1073083.1073135 2002

[40] [40]

Mauve: Measuring the gap between neural text and human text using divergence frontiers

Krishna Pillutla, Swabha Swayamdipta, Rowan Zellers, John Thickstun, Sean Welleck, Yejin Choi, and Zaid Harchaoui. Mauve: Measuring the gap between neural text and human text using divergence frontiers. In M. Ranzato, A. Beygelzimer, Y . Dauphin, P.S. Liang, and J. Wortman Vaughan, editors,Advances in Neural Information Processing Systems, volume 34, pages 4816–

work page

[41] [41]

URL https://proceedings.neurips.cc/paper_ files/paper/2021/file/260c2432a0eecc28ce03c10dadc078a4-Paper.pdf

Curran Associates, Inc., 2021. URL https://proceedings.neurips.cc/paper_ files/paper/2021/file/260c2432a0eecc28ce03c10dadc078a4-Paper.pdf

work page 2021

[42] [42]

SDXL: Improving Latent Diffusion Models for High-Resolution Image Synthesis

Dustin Podell, Zion English, Kyle Lacey, Andreas Blattmann, Tim Dockhorn, Jonas Müller, Joe Penna, and Robin Rombach. Sdxl: Improving latent diffusion models for high-resolution image synthesis, 2023. URLhttps://arxiv.org/abs/2307.01952

work page internal anchor Pith review Pith/arXiv arXiv 2023

[43] [43]

Language models are unsupervised multitask learners

Alec Radford, Jeff Wu, Rewon Child, David Luan, Dario Amodei, and Ilya Sutskever. Language models are unsupervised multitask learners. 2019. URL https://api.semanticscholar. org/CorpusID:160025533

work page 2019

[44] [44]

Variational inference with normalizing flows

Danilo Jimenez Rezende and Shakir Mohamed. Variational inference with normalizing flows. InProceedings of the 32nd International Conference on International Conference on Machine Learning - V olume 37, ICML’15, page 1530–1538. JMLR.org, 2015

work page 2015

[45] [45]

Generative modelling with inverse heat dissipation

Severi Rissanen, Markus Heinonen, and Arno Solin. Generative modelling with inverse heat dissipation. InThe Eleventh International Conference on Learning Representations, 2023. URL https://openreview.net/forum?id=4PJUBT9f2Ol

work page 2023

[46] [46]

High- resolution image synthesis with latent diffusion models

Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser, and Björn Ommer. High- resolution image synthesis with latent diffusion models. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 10684–10695, 2022

work page 2022

[47] [47]

Tencdm: Understanding the properties of the diffusion model in the space of language model encodings,

Alexander Shabalin, Viacheslav Meshchaninov, Egor Chimbulatov, Vladislav Lapikov, Roman Kim, Grigory Bartosh, Dmitry Molchanov, Sergey Markov, and Dmitry Vetrov. Tencdm: Understanding the properties of the diffusion model in the space of language model encodings,

work page

[48] [48]

URLhttps://arxiv.org/abs/2402.19097

work page arXiv

[49] [49]

The spread of low-credibility content by social bots.Nature Communications, 9(1), November 2018

Chengcheng Shao, Giovanni Luca Ciampaglia, Onur Varol, Kai-Cheng Yang, Alessandro Flammini, and Filippo Menczer. The spread of low-credibility content by social bots.Nature Communications, 9(1), November 2018. ISSN 2041-1723. doi: 10.1038/s41467-018-06930-7. URLhttp://dx.doi.org/10.1038/s41467-018-06930-7

work page doi:10.1038/s41467-018-06930-7 2018

[50] [50]

Attention is all you need.Advances in neural information processing systems, 30, 2017

Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. Attention is all you need.Advances in neural information processing systems, 30, 2017

work page 2017

[51] [51]

Taxonomy of risks posed by language models

Laura Weidinger, Jonathan Uesato, Maribeth Rauh, Conor Griffin, Po-Sen Huang, John Mellor, Amelia Glaese, Myra Cheng, Borja Balle, Atoosa Kasirzadeh, Courtney Biles, Sasha Brown, Zac Kenton, Will Hawkins, Tom Stepleton, Abeba Birhane, Lisa Anne Hendricks, Laura Rimell, William Isaac, Julia Haas, Sean Legassick, Geoffrey Irving, and Iason Gabriel. Taxonomy...

work page doi:10.1145/3531146.3533088 2022

[52] [52]

AR-diffusion: Auto-regressive diffusion model for text generation

Tong Wu, Zhihao Fan, Xiao Liu, Hai-Tao Zheng, Yeyun Gong, yelong shen, Jian Jiao, Juntao Li, zhongyu wei, Jian Guo, Nan Duan, and Weizhu Chen. AR-diffusion: Auto-regressive diffusion model for text generation. InThirty-seventh Conference on Neural Information Processing Systems, 2023. URLhttps://openreview.net/forum?id=0EG6qUQ4xE

work page 2023

[53] [53]

Seqdiffuseq: Text diffusion with encoder-decoder transformers.ArXiv, abs/2212.10325, 2022

Hongyi Yuan, Zheng Yuan, Chuanqi Tan, Fei Huang, and Songfang Huang. Seqdiffuseq: Text diffusion with encoder-decoder transformers.ArXiv, abs/2212.10325, 2022

work page arXiv 2022

[54] [54]

BERTScore: Evaluating Text Generation with BERT

Tianyi Zhang, Varsha Kishore, Felix Wu, Kilian Q. Weinberger, and Yoav Artzi. Bertscore: Evaluating text generation with bert, 2020. URLhttps://arxiv.org/abs/1904.09675. 13 A Limitations Pre-trained EmbeddingsOur proposed method relies on a pre-trained embedding matrix E from the BERT model. While this choice simplifies the training process and improves i...

work page internal anchor Pith review Pith/arXiv arXiv 2020