Self-conditioned Flow Map Language Models via Fixed-point Flows

Chanhyuk Lee; Floor Eijkelboom; Jaehoon Yoo; Jinwoo Kim; Nicholas M. Boffi; Seunghoon Hong; Wonjung Kim

arxiv: 2607.00714 · v1 · pith:DKYTWDOHnew · submitted 2026-07-01 · 💻 cs.CL · cs.AI

Self-conditioned Flow Map Language Models via Fixed-point Flows

Jaehoon Yoo , Wonjung Kim , Floor Eijkelboom , Chanhyuk Lee , Nicholas M. Boffi , Seunghoon Hong , Jinwoo Kim This is my paper

Pith reviewed 2026-07-02 12:59 UTC · model grok-4.3

classification 💻 cs.CL cs.AI

keywords self-conditioningflow language modelsfixed-point iterationflow mapsmodel distillationfew-step generationtext generationOpenWebText

0 comments

The pith

Self-conditioned flow language models solve a fixed-point iteration that bootstraps the denoiser and enables distillation to strong one-step models.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper shows that self-conditioning in continuous flow-based language models corresponds to solving a fixed-point iteration. This iteration uses the model's own denoising estimate to improve its performance. The authors define fixed-point flows as a two-dimensional class combining the flow process with this fixed-point iteration. These flows are valid flow maps that can be distilled from self-conditioned models through fixed-point distillation and flow map distillation. The resulting FMLM* model outperforms prior self-conditioned and few-step models in one- and few-step generation on OpenWebText.

Core claim

Flow language models with self-conditioning solve a fixed-point iteration that bootstraps the performance of the learned denoiser. Fixed-point flows define valid flow maps and can be distilled from self-conditioned models by compressing fixed-point iterations and the flow process, with FMLM* outperforming state-of-the-art self-conditioned models and few-step models in one- and few-step generation on OpenWebText.

What carries the argument

Fixed-point flows, a two-dimensional class of self-conditioned flows with one dimension for the flow process and one for the fixed-point iteration.

If this is right

Fixed-point flows define valid flow maps.
Self-conditioned models can be distilled into flow map language models via fixed-point distillation and flow map distillation.
The distilled FMLM* outperforms state-of-the-art self-conditioned models and few-step models in one- and few-step generation on OpenWebText.
The performance gains of self-conditioning arise from bootstrapping the denoiser through the fixed-point iteration.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The fixed-point view might extend to self-conditioning techniques used in non-language generative models such as images or audio.
Explicit optimization of the fixed-point structure during training could yield further gains beyond current distillation.
Flow map distillation may reduce inference latency enough to enable real-time applications that current iterative models cannot support.
The correspondence between self-conditioning and fixed-point iteration could help analyze convergence rates in other iterative generative processes.

Load-bearing premise

That self-conditioning exactly matches a fixed-point iteration and that the proposed distillation steps preserve performance without extra assumptions on the denoiser or data.

What would settle it

Training a flow language model without self-conditioning and measuring whether its one-step generation performance matches or exceeds the self-conditioned version on the same benchmark.

Figures

Figures reproduced from arXiv: 2607.00714 by Chanhyuk Lee, Floor Eijkelboom, Jaehoon Yoo, Jinwoo Kim, Nicholas M. Boffi, Seunghoon Hong, Wonjung Kim.

**Figure 1.** Figure 1: Overview. We show that flow language models with self-conditioning solve a fixed-point iteration that refines the denoising estimate. We leverage this insight to formulate fixed-point flows, a class of self-conditioned flows that run fixed-point iterations at each flow timestep. Fixed-point flows yield valid flow maps, which we learn by compressing both the flow and the fixed-point iterations. • Fixed-poin… view at source ↗

**Figure 2.** Figure 2: Convergence towards the fixed point across fixed-point iterations. [PITH_FULL_IMAGE:figures/full_fig_p008_2.png] view at source ↗

**Figure 3.** Figure 3: Warm-start and cold-start sampling with 1 and 100 fixed-point iterations. [PITH_FULL_IMAGE:figures/full_fig_p008_3.png] view at source ↗

**Figure 4.** Figure 4: Comparison between ELF⋆ and its teacher model ELF. Self-conditioning is removable (Q3). We find that, given a selfconditioned denoiser Dˆ (ELF in our case), we can learn its associated fixed-point denoiser D⋆ (13) through fixed-point distillation, which yields fixed-point velocity (14), a self-conditioning-free model. Herein, we consider CDEQ (Lin et al., 2026), an existing fixed-point distillation metho… view at source ↗

**Figure 5.** Figure 5: Comparison between FMLM⋆ and teacher model ELF. Self-conditioning is distillable into a flow map (Q4). Finally, we ask whether self-conditioning can be leveraged to train a few-step flow map. To this end, we distill the self-conditioned ELF teacher into a self-conditioned flow map language model FMLM⋆ , parameterized by the two-time denoiser δs,t (22). Following the offline route proposed in Section 3.4, … view at source ↗

**Figure 6.** Figure 6: A sample generated by FMLM⋆ with one-step decoding. 23 [PITH_FULL_IMAGE:figures/full_fig_p023_6.png] view at source ↗

**Figure 7.** Figure 7: A sample generated by FMLM⋆ with two-step decoding. 24 [PITH_FULL_IMAGE:figures/full_fig_p024_7.png] view at source ↗

**Figure 8.** Figure 8: A sample generated by FMLM⋆ with four-step decoding. 25 [PITH_FULL_IMAGE:figures/full_fig_p025_8.png] view at source ↗

read the original abstract

Self-conditioning is a core technique that enhances continuous flow-based language models, where the model learns to denoise generated text by conditioning on its own denoising estimate. While empirically successful, its performance improvements are poorly understood. Moreover, there is growing interest in the use of few-step generators based on flow maps, for which how to leverage self-conditioning is unclear. Here, we show that flow language models with self-conditioning solve a fixed-point iteration that bootstraps the performance of the learned denoiser. We use this viewpoint to formulate fixed-point flows, a two-dimensional class of self-conditioned flows, where the first dimension represents the flow process and the second represents the fixed-point iteration. We show that fixed-point flows define valid flow maps, and show that they can be distilled from self-conditioned flow models by compressing both fixed-point iterations and the flow process, the former with fixed-point distillation and the latter with flow map distillation. Our resulting flow map language model, FMLM$^\star$, outperforms state-of-the-art self-conditioned models and few-step models in one- and few-step generation on OpenWebText. Code is available at https://github.com/Ugness/self-conditioned-fmlm.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper reframes self-conditioning as a fixed-point iteration to enable distillation into competitive few-step flow map models.

read the letter

This paper's main point is that self-conditioning in flow language models can be viewed as solving a fixed-point iteration on the denoiser, which they use to define a two-dimensional class of fixed-point flows and distill them down to flow map language models.

They treat the flow process as one dimension and the fixed-point iteration as the second. The work shows these define valid flow maps and introduces fixed-point distillation plus flow map distillation to compress the original self-conditioned model. The result, FMLM*, beats prior self-conditioned models and other few-step baselines on OpenWebText for one- and few-step generation, with code released.

The new formulation is the clearest contribution. It gives a cleaner theoretical handle on why self-conditioning improves the denoiser and connects it to fixed-point ideas in a way that supports the distillation steps. The empirical results on the benchmark are straightforward and show gains where claimed.

The soft spot is the exact equivalence. Self-conditioning is presented as solving the fixed-point iteration, but this may only hold under conditions like the denoiser acting as a contraction mapping, which is not automatic for discrete token spaces. The distillation steps also need to preserve validity and performance without unstated assumptions on convergence or the flow regime. If the full derivations confirm these points, the claims stand; otherwise the outperformance could be narrower than stated.

This is for people working on flow or diffusion-style models for text, especially those interested in few-step sampling or theoretical views on conditioning. A reader focused on efficient generative modeling would get direct value. It deserves peer review because the new angle and the reported improvements are concrete enough to warrant checking the math and experiments in detail.

I would send it to referees with attention on the fixed-point equivalence and whether the distillation preserves the claimed properties.

Referee Report

2 major / 1 minor

Summary. The paper claims that self-conditioning in flow language models corresponds to a fixed-point iteration bootstrapping the denoiser, introduces fixed-point flows as a two-dimensional class of self-conditioned flows (one dimension for the flow process, one for the iteration), proves these define valid flow maps, and shows they can be distilled from self-conditioned models via fixed-point distillation and flow-map distillation to yield FMLM* which outperforms SOTA self-conditioned and few-step models on one- and few-step generation on OpenWebText. Code is released.

Significance. If the fixed-point equivalence and distillation preservation hold without unstated assumptions on the denoiser (e.g., contraction properties), the work supplies a theoretical grounding for an empirical technique and a route to efficient few-step flow-map LMs. Public code is a clear strength for reproducibility.

major comments (2)

[Abstract] Abstract: the central claim that self-conditioning 'exactly' solves a fixed-point iteration (bootstrapping the denoiser) is asserted without an explicit derivation or statement of the required conditions (e.g., Lipschitz continuity or contraction mapping on the discrete token embedding space). This equivalence is load-bearing for both the 2D fixed-point flow construction and the subsequent distillation claims; if it is only approximate, the validity of the flow maps and the performance preservation in FMLM* do not follow.
[Abstract] Abstract (distillation paragraph): the statement that fixed-point flows 'can be distilled ... by compressing both fixed-point iterations and the flow process' with no loss of validity or performance is presented without the precise compression operators or any verification that the resulting map remains a valid flow map. This is load-bearing for the claim that FMLM* outperforms baselines.

minor comments (1)

The abstract mentions 'continuous flow-based language models' but does not clarify how the discrete token nature interacts with the continuous flow formulation; a brief note on embedding/rounding would improve clarity.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback on the abstract. We respond to each major comment below and indicate where revisions will be made to improve clarity without altering the core technical claims.

read point-by-point responses

Referee: [Abstract] Abstract: the central claim that self-conditioning 'exactly' solves a fixed-point iteration (bootstrapping the denoiser) is asserted without an explicit derivation or statement of the required conditions (e.g., Lipschitz continuity or contraction mapping on the discrete token embedding space). This equivalence is load-bearing for both the 2D fixed-point flow construction and the subsequent distillation claims; if it is only approximate, the validity of the flow maps and the performance preservation in FMLM* do not follow.

Authors: Section 3 of the manuscript provides the explicit derivation establishing that self-conditioning corresponds to a fixed-point iteration, under the assumption that the denoiser acts as a contraction mapping on the token embedding space. The abstract summarizes this result at a high level. We agree that referencing the key condition would strengthen the abstract and will revise it accordingly in the next version to state the contraction mapping requirement. revision: yes
Referee: [Abstract] Abstract (distillation paragraph): the statement that fixed-point flows 'can be distilled ... by compressing both fixed-point iterations and the flow process' with no loss of validity or performance is presented without the precise compression operators or any verification that the resulting map remains a valid flow map. This is load-bearing for the claim that FMLM* outperforms baselines.

Authors: Sections 4 and 5 define the fixed-point distillation and flow-map distillation operators and include proofs that the resulting map remains a valid flow map with preserved performance. The abstract provides a concise overview. We will revise the abstract to note that validity and performance are preserved under the operators defined in the main text. revision: yes

Circularity Check

2 steps flagged

Self-conditioning to fixed-point iteration equivalence and distillation by construction

specific steps

self definitional [Abstract]
"we show that flow language models with self-conditioning solve a fixed-point iteration that bootstraps the performance of the learned denoiser. We use this viewpoint to formulate fixed-point flows, a two-dimensional class of self-conditioned flows, where the first dimension represents the flow process and the second represents the fixed-point iteration."

The paper presents the equivalence of self-conditioning to a fixed-point iteration as a result to be shown, then immediately uses 'this viewpoint' to define the new two-dimensional class. The bootstrapping performance is thereby tied directly to the iteration that is itself the self-conditioning mechanism, making the derivation self-definitional rather than independently derived.
fitted input called prediction [Abstract]
"We show that fixed-point flows define valid flow maps, and show that they can be distilled from self-conditioned flow models by compressing both fixed-point iterations and the flow process, the former with fixed-point distillation and the latter with flow map distillation. Our resulting flow map language model, FMLM*, outperforms state-of-the-art self-conditioned models and few-step models"

The distillation (fixed-point and flow map) is defined by compressing the iterations and process already present in the self-conditioned model; the claim that FMLM* outperforms is then presented as a prediction, but the construction ensures the distilled model inherits the original mechanisms by design.

full rationale

The paper's central claim equates self-conditioning with solving a fixed-point iteration and then uses that to define fixed-point flows and their distillation. This equivalence is asserted in the abstract as something 'shown,' but the provided text frames it as a viewpoint that directly motivates the new formulation and compression steps without external grounding or independent verification of the iteration property. Distillation is presented as preserving validity by compressing the same mechanisms, risking reduction to rephrasing. This yields partial circularity on the load-bearing claims, though the paper may contain independent empirical results on OpenWebText.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review provides no explicit free parameters, axioms, or invented entities; ledger is empty pending full text.

pith-pipeline@v0.9.1-grok · 5763 in / 1009 out tokens · 20361 ms · 2026-07-02T12:59:04.759179+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

28 extracted references · 23 canonical work pages · 15 internal anchors

[1]

Towards Closing the Autoregressive Gap in Language Modeling via Entropy-Gated Continuous Bitstream Diffusion

Georgios Batzolis, Mark Girolami, and Luca Ambrogioni. Towards closing the autoregressive gap in language modeling via entropy-gated continuous bitstream diffusion.arXiv preprint arXiv:2605.07013,

work page internal anchor Pith review Pith/arXiv arXiv
[2]

Spherical Flows for Sampling Categorical Data

10 Preprint. Under review. Jannis Chemseddine, Gregor Kornhardt, and Gabriele Steidl. Spherical flows for sampling categorical data.arXiv preprint arXiv:2605.05629,

work page internal anchor Pith review Pith/arXiv arXiv
[3]

Analog bits: Generating discrete data using diffusion models with self-conditioning.arXiv preprint arXiv:2208.04202, 2022

Ting Chen, Ruixiang Zhang, and Geoffrey Hinton. Analog bits: Generating discrete data using diffusion models with self-conditioning.arXiv preprint arXiv:2208.04202,

work page arXiv
[4]

LangFlow: Continuous Diffusion Rivals Discrete in Language Modeling

Yuxin Chen, Chumeng Liang, Hangke Sui, Ruihan Guo, Chaoran Cheng, Jiaxuan You, and Ge Liu. Langflow: Continuous diffusion rivals discrete in language modeling.arXiv preprint arXiv:2604.11748,

work page internal anchor Pith review Pith/arXiv arXiv
[5]

Beyond autoregression: Fast llms via self-distillation through time.arXiv preprint arXiv:2410.21035,

Justin Deschenaux and Caglar Gulcehre. Beyond autoregression: Fast llms via self-distillation through time.arXiv preprint arXiv:2410.21035,

work page arXiv
[6]

Language Modeling with Hyperspherical Flows

Justin Deschenaux and Caglar Gulcehre. Language modeling with hyperspherical flows.arXiv preprint arXiv:2605.11125,

work page internal anchor Pith review Pith/arXiv arXiv
[7]

Continuous diffusion for categorical data

Sander Dieleman, Laurent Sartran, Arman Roshannai, Nikolay Savinov, Yaroslav Ganin, Pierre H Richemond, Arnaud Doucet, Robin Strudel, Chris Dyer, Conor Durkan, et al. Continuous diffusion for categorical data.arXiv preprint arXiv:2211.15089,

work page internal anchor Pith review Pith/arXiv arXiv
[8]

Distillation of discrete diffusion through dimensional correlations.arXiv preprint arXiv:2410.08709,

Satoshi Hayakawa, Yuhta Takida, Masaaki Imaizumi, Hiromi Wakaki, and Yuki Mitsufuji. Distillation of discrete diffusion through dimensional correlations.arXiv preprint arXiv:2410.08709,

work page arXiv
[9]

ELF: Embedded Language Flows

Keya Hu, Linlu Qiu, Yiyang Lu, Hanhong Zhao, Tianhong Li, Yoon Kim, Jacob Andreas, and Kaiming He. Elf: Embedded language flows.arXiv preprint arXiv:2605.10938,

work page internal anchor Pith review Pith/arXiv arXiv
[10]

Numerical methods for mean field games and mean field type control.arXiv preprint arXiv:2106.06231,

Mathieu Lauriere. Numerical methods for mean field games and mean field type control.arXiv preprint arXiv:2106.06231,

work page arXiv
[11]

Flow Map Language Models: One-step Language Modeling via Continuous Denoising

Chanhyuk Lee, Jaehoon Yoo, Manan Agarwal, Sheel Shah, Jerry Huang, Aditi Raghunathan, Se- unghoon Hong, Nicholas M Boffi, and Jinwoo Kim. Flow map language models: One-step language modeling via continuous denoising.arXiv preprint arXiv:2602.16813,

work page internal anchor Pith review Pith/arXiv arXiv
[12]

Consistency Deep Equilibrium Models

Junchao Lin, Zenan Ling, Jingwen Xu, and Robert C Qiu. Consistency deep equilibrium models. arXiv preprint arXiv:2602.03024,

work page internal anchor Pith review Pith/arXiv arXiv
[13]

Flow Matching for Generative Modeling

Yaron Lipman, Ricky TQ Chen, Heli Ben-Hamu, Maximilian Nickel, and Matt Le. Flow matching for generative modeling.arXiv preprint arXiv:2210.02747,

work page internal anchor Pith review Pith/arXiv arXiv
[14]

One-step Latent-free Image Generation with Pixel Mean Flows

Yiyang Lu, Susie Lu, Qiao Sun, Hanhong Zhao, Zhicheng Jiang, Xianbang Wang, Tianhong Li, Zhengyang Geng, and Kaiming He. One-step latent-free image generation with pixel mean flows. arXiv preprint arXiv:2601.22158,

work page internal anchor Pith review Pith/arXiv arXiv
[15]

How to Train Your Latent Diffusion Language Model Jointly With the Latent Space

11 Preprint. Under review. Viacheslav Meshchaninov, Alexander Shabalin, Egor Chimbulatov, Nikita Gushchin, Ilya Koziev, Alexander Korotin, and Dmitry Vetrov. How to train your latent diffusion language model jointly with the latent space.arXiv preprint arXiv:2605.07933,

work page internal anchor Pith review Pith/arXiv arXiv
[16]

Fixed-Point Reasoners: Stable and Adaptive Deep Looped Transformers

Sajad Movahedi, Vera Milovanovi ´c, Shlomo Libo Feigin, Alexander Theus, Thomas Hofmann, Valentina Boeva, T Konstantin Rusch, and Antonio Orvieto. Fixed-point reasoners: Stable and adaptive deep looped transformers.arXiv preprint arXiv:2606.18206,

work page internal anchor Pith review Pith/arXiv arXiv
[17]

Discrete Flow Maps

Peter Potaptchik, Jason Yim, Adhi Saravanan, Peter Holderrieth, Eric Vanden-Eijnden, and Michael S Albergo. Discrete flow maps.arXiv preprint arXiv:2604.09784,

work page internal anchor Pith review Pith/arXiv arXiv
[18]

Candi: Hybrid discrete-continuous diffusion models

Patrick Pynadath, Jiaxin Shi, and Ruqi Zhang. Candi: Hybrid discrete-continuous diffusion models. arXiv preprint arXiv:2510.22510,

work page arXiv
[19]

Categorical flow maps.arXiv preprint arXiv:2602.12233,

Daan Roos, Oscar Davis, Floor Eijkelboom, Michael Bronstein, Max Welling, Ismail Ilkan Cey- lan, Luca Ambrogioni, and Jan-Willem van de Meent. Categorical flow maps.arXiv preprint arXiv:2602.12233,

work page arXiv
[20]

The diffusion duality.arXiv preprint arXiv:2506.10892,

Subham Sekhar Sahoo, Justin Deschenaux, Aaron Gokaslan, Guanghan Wang, Justin Chiu, and V olodymyr Kuleshov. The diffusion duality.arXiv preprint arXiv:2506.10892,

work page arXiv
[21]

Why Gaussian Diffusion Models Fail on Discrete Data and How to Prevent It?

Alexander Shabalin, Simon Elistratov, Viacheslav Meshchaninov, Ildus Sadrtdinov, and Dmitry Vetrov. Why gaussian diffusion models fail on discrete data?arXiv preprint arXiv:2604.02028,

work page internal anchor Pith review Pith/arXiv arXiv
[22]

Self-conditioned embedding diffusion for text generation.arXiv preprint arXiv:2211.04236,

Robin Strudel, Corentin Tallec, Florent Altch ´e, Yilun Du, Yaroslav Ganin, Arthur Mensch, Will Grathwohl, Nikolay Savinov, Sander Dieleman, Laurent Sifre, et al. Self-conditioned embedding diffusion for text generation.arXiv preprint arXiv:2211.04236,

work page arXiv
[23]

Continuous Diffusion Scales Competitively with Discrete Diffusion for Language

Zhihan Yang, Wei Guo, Shuibai Zhang, Subham Sekhar Sahoo, Yongxin Chen, Arash Vahdat, Morteza Mardani, and John Thickstun. Continuous diffusion scales competitively with discrete diffusion for language.arXiv preprint arXiv:2605.18530,

work page internal anchor Pith review Pith/arXiv arXiv
[24]

with consistency deep equilibrium (CDEQ) distillation (Lin et al., 2026). The distillation compresses the fixed-point iteration (10) into a single forward that predicts its limit z⋆, so the resulting model matches the converged self-conditioned denoiser without iterating at inference. Architecture.ELF (Hu et al.,

2026
[25]

conditions on the flow time t through a bank of four learned prefix tokens, prepended to the latent sequence, to which an embedding of t is added; the self- conditioning guidance weight enters through a second such bank. Following this design, to let the student track progress along the iteration, we condition it on a consistency timeτ which represents th...

2026
[26]

Aside from this modification, we strictly follow the original ELF architecture and train this variant using the identical hyperparameters

with the last hidden states of a pretrained GPT-2 Large model (Radford et al., 2019). Aside from this modification, we strictly follow the original ELF architecture and train this variant using the identical hyperparameters. We follow the hyperparameter settings of the original ELF model. Both models are trained for 5 epochs with a global batch size of 51...

2019
[27]

C QUALITATIVE RESULTS We provide generation samples from the FMLM ⋆ model

with γ= 0.75 and 1.0, respectively. C QUALITATIVE RESULTS We provide generation samples from the FMLM ⋆ model. The one-step, two-step, and four-step samples are shown in Figure 6, Figure 7, and Figure 8, respectively. 22 Preprint. Under review. Sampling Steps: 1gPPL: 92.37, Entropy: 5.27 are related to increasing the short-term size of the water cover, or...

1990
[28]

To clarify, the specific environmental impacts of Nij’ water over the year will have a significant influence on N&es and H

Station areas in wetlands could cause an increase by increasing water usage, by by drying up access to aater [23 and environmental degradation by by developing improved water quality from floods, the extent of water’s rocks and also water. To clarify, the specific environmental impacts of Nij’ water over the year will have a significant influence on N&es ...

2000

[1] [1]

Towards Closing the Autoregressive Gap in Language Modeling via Entropy-Gated Continuous Bitstream Diffusion

Georgios Batzolis, Mark Girolami, and Luca Ambrogioni. Towards closing the autoregressive gap in language modeling via entropy-gated continuous bitstream diffusion.arXiv preprint arXiv:2605.07013,

work page internal anchor Pith review Pith/arXiv arXiv

[2] [2]

Spherical Flows for Sampling Categorical Data

10 Preprint. Under review. Jannis Chemseddine, Gregor Kornhardt, and Gabriele Steidl. Spherical flows for sampling categorical data.arXiv preprint arXiv:2605.05629,

work page internal anchor Pith review Pith/arXiv arXiv

[3] [3]

Analog bits: Generating discrete data using diffusion models with self-conditioning.arXiv preprint arXiv:2208.04202, 2022

Ting Chen, Ruixiang Zhang, and Geoffrey Hinton. Analog bits: Generating discrete data using diffusion models with self-conditioning.arXiv preprint arXiv:2208.04202,

work page arXiv

[4] [4]

LangFlow: Continuous Diffusion Rivals Discrete in Language Modeling

Yuxin Chen, Chumeng Liang, Hangke Sui, Ruihan Guo, Chaoran Cheng, Jiaxuan You, and Ge Liu. Langflow: Continuous diffusion rivals discrete in language modeling.arXiv preprint arXiv:2604.11748,

work page internal anchor Pith review Pith/arXiv arXiv

[5] [5]

Beyond autoregression: Fast llms via self-distillation through time.arXiv preprint arXiv:2410.21035,

Justin Deschenaux and Caglar Gulcehre. Beyond autoregression: Fast llms via self-distillation through time.arXiv preprint arXiv:2410.21035,

work page arXiv

[6] [6]

Language Modeling with Hyperspherical Flows

Justin Deschenaux and Caglar Gulcehre. Language modeling with hyperspherical flows.arXiv preprint arXiv:2605.11125,

work page internal anchor Pith review Pith/arXiv arXiv

[7] [7]

Continuous diffusion for categorical data

Sander Dieleman, Laurent Sartran, Arman Roshannai, Nikolay Savinov, Yaroslav Ganin, Pierre H Richemond, Arnaud Doucet, Robin Strudel, Chris Dyer, Conor Durkan, et al. Continuous diffusion for categorical data.arXiv preprint arXiv:2211.15089,

work page internal anchor Pith review Pith/arXiv arXiv

[8] [8]

Distillation of discrete diffusion through dimensional correlations.arXiv preprint arXiv:2410.08709,

Satoshi Hayakawa, Yuhta Takida, Masaaki Imaizumi, Hiromi Wakaki, and Yuki Mitsufuji. Distillation of discrete diffusion through dimensional correlations.arXiv preprint arXiv:2410.08709,

work page arXiv

[9] [9]

ELF: Embedded Language Flows

Keya Hu, Linlu Qiu, Yiyang Lu, Hanhong Zhao, Tianhong Li, Yoon Kim, Jacob Andreas, and Kaiming He. Elf: Embedded language flows.arXiv preprint arXiv:2605.10938,

work page internal anchor Pith review Pith/arXiv arXiv

[10] [10]

Numerical methods for mean field games and mean field type control.arXiv preprint arXiv:2106.06231,

Mathieu Lauriere. Numerical methods for mean field games and mean field type control.arXiv preprint arXiv:2106.06231,

work page arXiv

[11] [11]

Flow Map Language Models: One-step Language Modeling via Continuous Denoising

Chanhyuk Lee, Jaehoon Yoo, Manan Agarwal, Sheel Shah, Jerry Huang, Aditi Raghunathan, Se- unghoon Hong, Nicholas M Boffi, and Jinwoo Kim. Flow map language models: One-step language modeling via continuous denoising.arXiv preprint arXiv:2602.16813,

work page internal anchor Pith review Pith/arXiv arXiv

[12] [12]

Consistency Deep Equilibrium Models

Junchao Lin, Zenan Ling, Jingwen Xu, and Robert C Qiu. Consistency deep equilibrium models. arXiv preprint arXiv:2602.03024,

work page internal anchor Pith review Pith/arXiv arXiv

[13] [13]

Flow Matching for Generative Modeling

Yaron Lipman, Ricky TQ Chen, Heli Ben-Hamu, Maximilian Nickel, and Matt Le. Flow matching for generative modeling.arXiv preprint arXiv:2210.02747,

work page internal anchor Pith review Pith/arXiv arXiv

[14] [14]

One-step Latent-free Image Generation with Pixel Mean Flows

Yiyang Lu, Susie Lu, Qiao Sun, Hanhong Zhao, Zhicheng Jiang, Xianbang Wang, Tianhong Li, Zhengyang Geng, and Kaiming He. One-step latent-free image generation with pixel mean flows. arXiv preprint arXiv:2601.22158,

work page internal anchor Pith review Pith/arXiv arXiv

[15] [15]

How to Train Your Latent Diffusion Language Model Jointly With the Latent Space

11 Preprint. Under review. Viacheslav Meshchaninov, Alexander Shabalin, Egor Chimbulatov, Nikita Gushchin, Ilya Koziev, Alexander Korotin, and Dmitry Vetrov. How to train your latent diffusion language model jointly with the latent space.arXiv preprint arXiv:2605.07933,

work page internal anchor Pith review Pith/arXiv arXiv

[16] [16]

Fixed-Point Reasoners: Stable and Adaptive Deep Looped Transformers

Sajad Movahedi, Vera Milovanovi ´c, Shlomo Libo Feigin, Alexander Theus, Thomas Hofmann, Valentina Boeva, T Konstantin Rusch, and Antonio Orvieto. Fixed-point reasoners: Stable and adaptive deep looped transformers.arXiv preprint arXiv:2606.18206,

work page internal anchor Pith review Pith/arXiv arXiv

[17] [17]

Discrete Flow Maps

Peter Potaptchik, Jason Yim, Adhi Saravanan, Peter Holderrieth, Eric Vanden-Eijnden, and Michael S Albergo. Discrete flow maps.arXiv preprint arXiv:2604.09784,

work page internal anchor Pith review Pith/arXiv arXiv

[18] [18]

Candi: Hybrid discrete-continuous diffusion models

Patrick Pynadath, Jiaxin Shi, and Ruqi Zhang. Candi: Hybrid discrete-continuous diffusion models. arXiv preprint arXiv:2510.22510,

work page arXiv

[19] [19]

Categorical flow maps.arXiv preprint arXiv:2602.12233,

Daan Roos, Oscar Davis, Floor Eijkelboom, Michael Bronstein, Max Welling, Ismail Ilkan Cey- lan, Luca Ambrogioni, and Jan-Willem van de Meent. Categorical flow maps.arXiv preprint arXiv:2602.12233,

work page arXiv

[20] [20]

The diffusion duality.arXiv preprint arXiv:2506.10892,

Subham Sekhar Sahoo, Justin Deschenaux, Aaron Gokaslan, Guanghan Wang, Justin Chiu, and V olodymyr Kuleshov. The diffusion duality.arXiv preprint arXiv:2506.10892,

work page arXiv

[21] [21]

Why Gaussian Diffusion Models Fail on Discrete Data and How to Prevent It?

Alexander Shabalin, Simon Elistratov, Viacheslav Meshchaninov, Ildus Sadrtdinov, and Dmitry Vetrov. Why gaussian diffusion models fail on discrete data?arXiv preprint arXiv:2604.02028,

work page internal anchor Pith review Pith/arXiv arXiv

[22] [22]

Self-conditioned embedding diffusion for text generation.arXiv preprint arXiv:2211.04236,

Robin Strudel, Corentin Tallec, Florent Altch ´e, Yilun Du, Yaroslav Ganin, Arthur Mensch, Will Grathwohl, Nikolay Savinov, Sander Dieleman, Laurent Sifre, et al. Self-conditioned embedding diffusion for text generation.arXiv preprint arXiv:2211.04236,

work page arXiv

[23] [23]

Continuous Diffusion Scales Competitively with Discrete Diffusion for Language

Zhihan Yang, Wei Guo, Shuibai Zhang, Subham Sekhar Sahoo, Yongxin Chen, Arash Vahdat, Morteza Mardani, and John Thickstun. Continuous diffusion scales competitively with discrete diffusion for language.arXiv preprint arXiv:2605.18530,

work page internal anchor Pith review Pith/arXiv arXiv

[24] [24]

with consistency deep equilibrium (CDEQ) distillation (Lin et al., 2026). The distillation compresses the fixed-point iteration (10) into a single forward that predicts its limit z⋆, so the resulting model matches the converged self-conditioned denoiser without iterating at inference. Architecture.ELF (Hu et al.,

2026

[25] [25]

conditions on the flow time t through a bank of four learned prefix tokens, prepended to the latent sequence, to which an embedding of t is added; the self- conditioning guidance weight enters through a second such bank. Following this design, to let the student track progress along the iteration, we condition it on a consistency timeτ which represents th...

2026

[26] [26]

Aside from this modification, we strictly follow the original ELF architecture and train this variant using the identical hyperparameters

with the last hidden states of a pretrained GPT-2 Large model (Radford et al., 2019). Aside from this modification, we strictly follow the original ELF architecture and train this variant using the identical hyperparameters. We follow the hyperparameter settings of the original ELF model. Both models are trained for 5 epochs with a global batch size of 51...

2019

[27] [27]

C QUALITATIVE RESULTS We provide generation samples from the FMLM ⋆ model

with γ= 0.75 and 1.0, respectively. C QUALITATIVE RESULTS We provide generation samples from the FMLM ⋆ model. The one-step, two-step, and four-step samples are shown in Figure 6, Figure 7, and Figure 8, respectively. 22 Preprint. Under review. Sampling Steps: 1gPPL: 92.37, Entropy: 5.27 are related to increasing the short-term size of the water cover, or...

1990

[28] [28]

To clarify, the specific environmental impacts of Nij’ water over the year will have a significant influence on N&es and H

Station areas in wetlands could cause an increase by increasing water usage, by by drying up access to aater [23 and environmental degradation by by developing improved water quality from floods, the extent of water’s rocks and also water. To clarify, the specific environmental impacts of Nij’ water over the year will have a significant influence on N&es ...

2000