{"record_type":"pith_number_record","schema_url":"https://pith.science/schemas/pith-number/v1.json","pith_number":"pith:2026:YYYCE7JAXC76Q5PEEXX4PR3CWE","short_pith_number":"pith:YYYCE7JA","schema_version":"1.0","canonical_sha256":"c630227d20b8bfe875e425efc7c762b10fa4c5619500e55f5688962c9405e32c","source":{"kind":"arxiv","id":"2602.16813","version":3},"attestation_state":"computed","paper":{"title":"Flow Map Language Models: One-step Language Modeling via Continuous Denoising","license":"http://creativecommons.org/licenses/by/4.0/","headline":"Continuous flows over one-hot token embeddings match discrete diffusion quality and enable one-step generation that exceeds eight-step baselines.","cross_cats":["cs.AI"],"primary_cat":"cs.CL","authors_text":"Aditi Raghunathan, Chanhyuk Lee, Jaehoon Yoo, Jerry Huang, Jinwoo Kim, Manan Agarwal, Nicholas M. Boffi, Seunghoon Hong, Sheel Shah","submitted_at":"2026-02-18T19:23:07Z","abstract_excerpt":"Language models based on discrete diffusion have attracted widespread interest for their potential to provide faster generation than autoregressive models. Despite their promise, these models typically produce samples whose quality sharply degrades in the few-step regime, preventing a dramatic speedup in practice. Here, we show that language models based on continuous flows over one-hot token embeddings can outperform discrete diffusion in both quality and speed. Importantly, our continuous formulation defines a unique flow map that can be learned directly for efficient few-step inference, a s"},"verification_status":{"content_addressed":true,"pith_receipt":true,"author_attested":false,"weak_author_claims":0,"strong_author_claims":0,"externally_anchored":false,"storage_verified":false,"citation_signatures":0,"replication_records":0,"graph_snapshot":true,"references_resolved":false,"formal_links_present":true},"canonical_record":{"source":{"id":"2602.16813","kind":"arxiv","version":3},"metadata":{"license":"http://creativecommons.org/licenses/by/4.0/","primary_cat":"cs.CL","submitted_at":"2026-02-18T19:23:07Z","cross_cats_sorted":["cs.AI"],"title_canon_sha256":"375548dedff37735ecf8b63591b6730351be992e4f018459d7f0776cf108523f","abstract_canon_sha256":"075546144abb6cae9abd8bc29235b660f8d6fd391bcea3423e6091466e61649e"},"schema_version":"1.0"},"receipt":{"kind":"pith_receipt","key_id":"pith-v1-2026-05","algorithm":"ed25519","signed_at":"2026-05-21T02:04:59.370036Z","signature_b64":"MGlUZi4qFGFta2qE+1ZGYOgULd+LtW0wBtScsCdTu2JPxQ1NrklpqWtIJopcGFP+bvvoKGz9FVuUhWa9FB6EDg==","signed_message":"canonical_sha256_bytes","builder_version":"pith-number-builder-2026-05-17-v1","receipt_version":"0.3","canonical_sha256":"c630227d20b8bfe875e425efc7c762b10fa4c5619500e55f5688962c9405e32c","last_reissued_at":"2026-05-21T02:04:59.369201Z","signature_status":"signed_v1","first_computed_at":"2026-05-21T02:04:59.369201Z","public_key_fingerprint":"8d4b5ee74e4693bcd1df2446408b0d54"},"graph_snapshot":{"paper":{"title":"Flow Map Language Models: One-step Language Modeling via Continuous Denoising","license":"http://creativecommons.org/licenses/by/4.0/","headline":"Continuous flows over one-hot token embeddings match discrete diffusion quality and enable one-step generation that exceeds eight-step baselines.","cross_cats":["cs.AI"],"primary_cat":"cs.CL","authors_text":"Aditi Raghunathan, Chanhyuk Lee, Jaehoon Yoo, Jerry Huang, Jinwoo Kim, Manan Agarwal, Nicholas M. Boffi, Seunghoon Hong, Sheel Shah","submitted_at":"2026-02-18T19:23:07Z","abstract_excerpt":"Language models based on discrete diffusion have attracted widespread interest for their potential to provide faster generation than autoregressive models. Despite their promise, these models typically produce samples whose quality sharply degrades in the few-step regime, preventing a dramatic speedup in practice. Here, we show that language models based on continuous flows over one-hot token embeddings can outperform discrete diffusion in both quality and speed. Importantly, our continuous formulation defines a unique flow map that can be learned directly for efficient few-step inference, a s"},"claims":{"count":4,"items":[{"kind":"strongest_claim","text":"We build a flow language model (FLM), a continuous flow that matches state-of-the-art discrete diffusion baselines on the One Billion Words (LM1B) and OpenWebText (OWT) datasets. We then distill FLM into a flow map language model (FMLM), whose one-step generation exceeds the 8-step quality of recent few-step discrete diffusion language models.","source":"verdict.strongest_claim","status":"machine_extracted","claim_id":"C1","attestation":"unclaimed"},{"kind":"weakest_assumption","text":"That a continuous flow defined over one-hot token embeddings can be learned such that the associated flow map preserves discrete token structure and yields high-quality samples without requiring additional discrete constraints or post-hoc corrections.","source":"verdict.weakest_assumption","status":"machine_extracted","claim_id":"C2","attestation":"unclaimed"},{"kind":"one_line_summary","text":"Continuous flow language models match discrete diffusion baselines and their distilled one-step flow map versions exceed 8-step discrete diffusion quality on LM1B and OWT.","source":"verdict.one_line_summary","status":"machine_extracted","claim_id":"C3","attestation":"unclaimed"},{"kind":"headline","text":"Continuous flows over one-hot token embeddings match discrete diffusion quality and enable one-step generation that exceeds eight-step baselines.","source":"verdict.pith_extraction.headline","status":"machine_extracted","claim_id":"C4","attestation":"unclaimed"}],"snapshot_sha256":"545df19e2a620974727488abf7917a660cd4c5c0eee55df3bd6581a678910b83"},"source":{"id":"2602.16813","kind":"arxiv","version":3},"verdict":{"id":"943cf335-0461-4d6b-9fce-90a69eab902f","model_set":{"reader":"grok-4.3"},"created_at":"2026-05-15T20:58:47.776782Z","strongest_claim":"We build a flow language model (FLM), a continuous flow that matches state-of-the-art discrete diffusion baselines on the One Billion Words (LM1B) and OpenWebText (OWT) datasets. We then distill FLM into a flow map language model (FMLM), whose one-step generation exceeds the 8-step quality of recent few-step discrete diffusion language models.","one_line_summary":"Continuous flow language models match discrete diffusion baselines and their distilled one-step flow map versions exceed 8-step discrete diffusion quality on LM1B and OWT.","pipeline_version":"pith-pipeline@v0.9.0","weakest_assumption":"That a continuous flow defined over one-hot token embeddings can be learned such that the associated flow map preserves discrete token structure and yields high-quality samples without requiring additional discrete constraints or post-hoc corrections.","pith_extraction_headline":"Continuous flows over one-hot token embeddings match discrete diffusion quality and enable one-step generation that exceeds eight-step baselines."},"integrity":{"clean":true,"summary":{"advisory":0,"critical":0,"by_detector":{},"informational":0},"endpoint":"/pith/2602.16813/integrity.json","findings":[],"available":true,"detectors_run":[],"snapshot_sha256":"c28c3603d3b5d939e8dc4c7e95fa8dfce3d595e45f758748cecf8e644a296938"},"references":{"count":0,"sample":[],"resolved_work":0,"snapshot_sha256":"258153158e38e3291e3d48162225fcdb2d5a3ed65a07baac614ab91432fd4f57","internal_anchors":0},"formal_canon":{"evidence_count":2,"snapshot_sha256":"7f32124911691fcbed92af0b53bc0e57042ff2a11d042a86f0ef76248fe8f36f"},"author_claims":{"count":0,"strong_count":0,"snapshot_sha256":"258153158e38e3291e3d48162225fcdb2d5a3ed65a07baac614ab91432fd4f57"},"builder_version":"pith-number-builder-2026-05-17-v1"},"aliases":[{"alias_kind":"arxiv","alias_value":"2602.16813","created_at":"2026-05-21T02:04:59.369356+00:00"},{"alias_kind":"arxiv_version","alias_value":"2602.16813v3","created_at":"2026-05-21T02:04:59.369356+00:00"},{"alias_kind":"doi","alias_value":"10.48550/arxiv.2602.16813","created_at":"2026-05-21T02:04:59.369356+00:00"},{"alias_kind":"pith_short_12","alias_value":"YYYCE7JAXC76","created_at":"2026-05-21T02:04:59.369356+00:00"},{"alias_kind":"pith_short_16","alias_value":"YYYCE7JAXC76Q5PE","created_at":"2026-05-21T02:04:59.369356+00:00"},{"alias_kind":"pith_short_8","alias_value":"YYYCE7JA","created_at":"2026-05-21T02:04:59.369356+00:00"}],"events":[],"event_summary":{},"paper_claims":[],"inbound_citations":{"count":9,"internal_anchor_count":9,"sample":[{"citing_arxiv_id":"2605.23346","citing_title":"Contrastive Distribution Matching for Amortized Sequential Monte Carlo in Discrete Diffusion","ref_index":40,"is_internal_anchor":true},{"citing_arxiv_id":"2605.23605","citing_title":"DiLaDiff: Distilled Latent-Augmented Diffusion for Language Modeling","ref_index":4,"is_internal_anchor":true},{"citing_arxiv_id":"2605.18530","citing_title":"Continuous Diffusion Scales Competitively with Discrete Diffusion for Language","ref_index":30,"is_internal_anchor":true},{"citing_arxiv_id":"2605.19470","citing_title":"Drifting Objectives for Refining Discrete Diffusion Language Models","ref_index":19,"is_internal_anchor":true},{"citing_arxiv_id":"2605.13681","citing_title":"Sampling from Flow Language Models via Marginal-Conditioned Bridges","ref_index":13,"is_internal_anchor":true},{"citing_arxiv_id":"2605.10938","citing_title":"ELF: Embedded Language Flows","ref_index":30,"is_internal_anchor":true},{"citing_arxiv_id":"2604.11748","citing_title":"LangFlow: Continuous Diffusion Rivals Discrete in Language Modeling","ref_index":12,"is_internal_anchor":true},{"citing_arxiv_id":"2605.07933","citing_title":"How to Train Your Latent Diffusion Language Model Jointly With the Latent Space","ref_index":26,"is_internal_anchor":true},{"citing_arxiv_id":"2605.07193","citing_title":"Coupling Models for One-Step Discrete Generation","ref_index":13,"is_internal_anchor":true}]},"formal_canon":{"evidence_count":2,"sample":[],"anchors":[]},"links":{"html":"https://pith.science/pith/YYYCE7JAXC76Q5PEEXX4PR3CWE","json":"https://pith.science/pith/YYYCE7JAXC76Q5PEEXX4PR3CWE.json","graph_json":"https://pith.science/api/pith-number/YYYCE7JAXC76Q5PEEXX4PR3CWE/graph.json","events_json":"https://pith.science/api/pith-number/YYYCE7JAXC76Q5PEEXX4PR3CWE/events.json","paper":"https://pith.science/paper/YYYCE7JA"},"agent_actions":{"view_html":"https://pith.science/pith/YYYCE7JAXC76Q5PEEXX4PR3CWE","download_json":"https://pith.science/pith/YYYCE7JAXC76Q5PEEXX4PR3CWE.json","view_paper":"https://pith.science/paper/YYYCE7JA","resolve_alias":"https://pith.science/api/pith-number/resolve?arxiv=2602.16813&json=true","fetch_graph":"https://pith.science/api/pith-number/YYYCE7JAXC76Q5PEEXX4PR3CWE/graph.json","fetch_events":"https://pith.science/api/pith-number/YYYCE7JAXC76Q5PEEXX4PR3CWE/events.json","actions":{"anchor_timestamp":"https://pith.science/pith/YYYCE7JAXC76Q5PEEXX4PR3CWE/action/timestamp_anchor","attest_storage":"https://pith.science/pith/YYYCE7JAXC76Q5PEEXX4PR3CWE/action/storage_attestation","attest_author":"https://pith.science/pith/YYYCE7JAXC76Q5PEEXX4PR3CWE/action/author_attestation","sign_citation":"https://pith.science/pith/YYYCE7JAXC76Q5PEEXX4PR3CWE/action/citation_signature","submit_replication":"https://pith.science/pith/YYYCE7JAXC76Q5PEEXX4PR3CWE/action/replication_record"}},"created_at":"2026-05-21T02:04:59.369356+00:00","updated_at":"2026-05-21T02:04:59.369356+00:00"}