pith. sign in

arxiv: 2603.03305 · v2 · pith:DNOJSZAEnew · submitted 2026-02-08 · 💻 cs.CL · cs.AI· cs.LG

The Hidden Cost of Structured Generation in LLMs: Draft-Conditioned Constrained Decoding

classification 💻 cs.CL cs.AIcs.LG
keywords constraineddecodingdraftdccdmodelstructureddraft-conditionedgeneration
0
0 comments X
read the original abstract

Large language models (LLMs) are increasingly used to generate executable outputs, JSON objects, and API calls, where a single syntax error can make the output unusable. Constrained decoding enforces validity token-by-token via masking and renormalization, but it can distort generation when the model assigns low probability mass to valid continuations, pushing decoding toward locally valid yet semantically incorrect trajectories. We propose \emph{Draft-Conditioned Constrained Decoding (DCCD)}, a simple two-step, training-free inference procedure that decouples semantic planning from structural enforcement: an unconstrained draft is generated first, and constrained decoding is then applied, conditioned on this draft, to guarantee validity. We analyze DCCD through a KL-projection view, showing that draft conditioning increases feasible mass and reduces the cumulative "projection tax" induced by hard constraints, with an optional best-of-$K$ draft selection. Across structured reasoning benchmarks, DCCD improves strict structured accuracy by up to +24 percentage points over standard constrained decoding (e.g., 15.2\% to 39.0\% on GSM8K with a 1B model), and enables smaller model pairs to match or exceed much larger constrained baselines, yielding substantial gains in parameter efficiency.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 3 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. The Extrapolation Cliff in On-Policy Distillation of Near-Deterministic Structured Outputs

    cs.LG 2026-05 unverdicted novelty 7.0

    On-policy distillation has an extrapolation cliff at closed-form lambda*(p,b,c) set by teacher modal probability, warm-start mass, and clip strength, past which training shifts from format-preserving to format-collapsing.

  2. Schema Key Wording as an Instruction Channel in Structured Generation under Constrained Decoding

    cs.CL 2026-04 unverdicted novelty 7.0

    Schema-key wording functions as an implicit instruction channel under constrained decoding, with experiments showing that rephrasing only the keys can substantially change accuracy on math benchmarks while prompt, mod...

  3. ATLAS-RTC: Closing the Loop on LLM Agent Output with Token-Level Runtime Control

    cs.LG 2026-03 unverdicted novelty 6.0

    ATLAS-RTC raises first-attempt success on structured LLM generation and tool calling by 20-37.8 points through closed-loop token-level interventions.