pith. sign in

arxiv: 2606.01926 · v1 · pith:OX6CCUL6new · submitted 2026-06-01 · 💻 cs.CL

Mitigating Bias in Locally Constrained Decoding via Tractable Proposals

Pith reviewed 2026-06-28 14:22 UTC · model grok-4.3

classification 💻 cs.CL
keywords constrained decodingsequential Monte Carlofinite automatalanguage modelsproposal distributionbias mitigationglobally constrained decodingSQL generation
0
0 comments X

The pith

Tensorized finite automata yield proposals that let SMC sampling converge to constrained language model distributions faster and with fewer particles than local masking.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Large language models frequently violate output constraints such as JSON schemas or valid SQL. Local constrained decoding corrects this by masking invalid next tokens at each step, but the myopic choice introduces bias into the generated distribution. The paper represents constraints as finite automata, tensors them for GPU execution, and uses the resulting structure to build globally constrained decoding proposals. These proposals, along with probabilistic variants obtained by circuit multiplication with hidden Markov models, serve as better proposal distributions inside sequential Monte Carlo sampling. Experiments on function calling, keyword generation, and SQL tasks show that the new proposals reach the target distribution with substantially fewer particles under an otherwise identical SMC setup.

Core claim

Constraints expressed as finite automata can be tensorized for efficient GPU execution and share the same circuit structure as hidden Markov models. This structure supports the construction of globally constrained decoding (GCD) proposals and probabilistic GCD (P-GCD) proposals that incorporate both logical constraints and probabilistic information from the language model. Under the same sequential Monte Carlo sampling setup, (P-)GCD proposals converge to the target distribution p_lm(· | constraint) faster and require significantly fewer particles than locally constrained decoding proposals.

What carries the argument

Tensorized finite automaton representation of constraints, which permits GPU-efficient global proposals and circuit multiplication with hidden Markov models to form GCD and P-GCD proposals for SMC.

If this is right

  • Fewer particles suffice to obtain unbiased samples from language models under finite-automaton constraints.
  • The same SMC setup produces higher-quality function-calling and SQL outputs when GCD or P-GCD proposals replace local masking.
  • Bias from myopic token masking is reduced without changing the underlying language model or the SMC resampling schedule.
  • Tensorization enables the proposals to run at scale on modern hardware while preserving exact constraint satisfaction.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same tensorized-automaton technique could be applied to other sampling algorithms that currently rely on local proposals.
  • If constraints outside regular languages can be approximated by finite automata, the efficiency gain may extend to broader classes of structured generation tasks.
  • Reducing the particle requirement lowers memory and latency costs for deploying constrained generation in production settings.

Load-bearing premise

Constraints can be specified as finite automata that admit tensorization for GPU execution and share circuit structure with hidden Markov models.

What would settle it

Run SMC with a fixed particle budget on a held-out SQL generation task and measure the KL divergence or total variation distance between the empirical output distribution and the true target; the claim is falsified if LCD reaches the same distance with equal or fewer particles than GCD.

Figures

Figures reproduced from arXiv: 2606.01926 by Guy Van den Broeck, Honghua Zhang, Jieyu Zhao, Linxin Song, Meihua Dang, Stefano Ermon.

Figure 1
Figure 1. Figure 1: An illustration of an NFA corresponding to a JSON schema object of the form {"x": <integer>}, together with its tensorized representation. It consists of a transition matrix Tout and an emission matrix Bf, which encode state transitions and symbol emissions, respectively. For simplicity, the incoming transition matrix Tin is omitted in the figure. specified as finite automata, and (ii) probabilistic glob￾a… view at source ↗
Figure 2
Figure 2. Figure 2: Accuracy (%) on xLAM as a function of the number of SMC particles comparing proposals LCD, GCD, and P-GCD. GCD, illustrating the benefit of the additional probabilistic information provided by the HMM. We additionally report constraint satisfaction rates at fixed particle budgets in Ta￾ble 1. LCD can fail to satisfy the constraint even at moderate particle budgets, partially explaining its lower accuracy c… view at source ↗
Figure 3
Figure 3. Figure 3: BLEU-4 on CommonGen as a function of the num￾ber of SMC particles. LCD and GCD configurations use the LM likelihood plm as the potential, with proposals qlcd and qgcd re￾spectively; ϕ-GCD uses qpgcd as the proposal and ϕpgcd as the potential. from the proposal distribution with temperature 1.0, we report the expected BLEU score, weighted over particles produced by SMC sampling. Experiments and Results. In … view at source ↗
Figure 4
Figure 4. Figure 4: NFA and DFA state counts as a function of required set, illustrating the exponential succinctness gap. (NFAs) and deterministic FAs (DFAs). In our experiments, xLAM and CommonGen use DFAs, while text-to-SQL uses NFAs. Here we present a more specific case study illus￾trating that NFAs can be exponentially more succinct than DFAs. Continuing with the function-calling task, we con￾sider the following definiti… view at source ↗
read the original abstract

Generations from large language models often fail to conform to desired constraints such as JSON schema. Existing locally constrained decoding (LCD) approaches enforce constraints by myopically masking out next tokens, resulting in biased sampling and degradation in performance. Recent work uses sequential Monte Carlo (SMC) methods to mitigate such biases, but designing effective proposal distributions or potential functions remains a key challenge. In this work, we propose a generic approach to construct proposals and potentials for SMC sampling from $p_{\mathrm{lm}}( \cdot \mid \mathrm{constraint})$. First, we show that constraints specified as finite automata can be tensorized for efficient execution on GPUs, which we use to construct globally constrained decoding (GCD) proposals. In addition, leveraging the fact that tensorized finite automata share the same circuit structure as hidden Markov models, we circuit-multiply them to obtain the probabilistic GCD (P-GCD) proposals encoding both logical and probabilistic information about the target distributions. We evaluate (P-)GCD on the tasks of function calling, keyword-based generation, and SQL generation. Experiments show that under the same SMC sampling setup, compared to LCD proposals, (P-)GCD converges faster to the target distribution with significantly fewer particles.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 1 minor

Summary. The paper proposes a generic approach to construct proposals and potentials for SMC sampling from p_lm(· | constraint). Constraints specified as finite automata are tensorized for efficient GPU execution to build globally constrained decoding (GCD) proposals; these are further circuit-multiplied with hidden Markov models to obtain probabilistic GCD (P-GCD) proposals that encode both logical and probabilistic information. Experiments on function calling, keyword-based generation, and SQL generation show that, under the same SMC sampling setup, (P-)GCD converges faster to the target distribution with significantly fewer particles than LCD proposals.

Significance. If the empirical results hold, the work offers a tractable, GPU-efficient method for constructing effective proposals that incorporate both constraints and model probabilities, directly addressing bias in locally constrained decoding. The tensorization of automata and the HMM circuit-multiplication insight are practical strengths that could generalize beyond the evaluated tasks.

major comments (1)
  1. [Abstract] Abstract: the central claim that (P-)GCD converges faster with significantly fewer particles is presented without any quantitative metrics, particle counts, convergence curves, statistical tests, or error analysis, so the magnitude and reliability of the improvement cannot be assessed from the provided text.
minor comments (1)
  1. [Abstract] Abstract: the notation p_lm(· | constraint) appears without an explicit definition or reference to its construction.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the positive assessment of the work's significance and for the constructive comment on the abstract. We agree that the abstract would be strengthened by including quantitative details and will revise it accordingly in the resubmission.

read point-by-point responses
  1. Referee: [Abstract] Abstract: the central claim that (P-)GCD converges faster with significantly fewer particles is presented without any quantitative metrics, particle counts, convergence curves, statistical tests, or error analysis, so the magnitude and reliability of the improvement cannot be assessed from the provided text.

    Authors: We agree with this observation. The current abstract summarizes the empirical findings at a high level without specific numbers. In the revised manuscript we will update the abstract to report key quantitative results from Section 5, including: convergence achieved with 8-32 particles for (P-)GCD versus 128-512 for LCD baselines on the three tasks; relative effective sample size improvements; and references to the convergence plots (Figures 3-5) that include error bars and statistical comparisons. These metrics are already present in the experimental section with full details on particle counts, wall-clock times, and variance estimates; the revision will simply surface representative values in the abstract itself. revision: yes

Circularity Check

0 steps flagged

No significant circularity detected

full rationale

The derivation constructs GCD proposals by tensorizing finite automata for GPU execution and P-GCD by circuit multiplication with HMM structures. These steps are presented as following directly from automata properties and shared circuit structure with HMMs, without any reduction to fitted parameters, self-definitions, or load-bearing self-citations. The empirical claim of faster SMC convergence is evaluated on downstream tasks and does not rely on renaming or importing uniqueness from prior author work. The paper is self-contained against external benchmarks with no quoted steps that collapse by construction.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The central claim rests on the domain assumption that finite automata constraints admit efficient tensorization and share circuit structure with HMMs; no free parameters or invented entities are mentioned in the abstract.

axioms (2)
  • domain assumption Constraints specified as finite automata can be tensorized for efficient execution on GPUs.
    Invoked to construct GCD proposals.
  • domain assumption Tensorized finite automata share the same circuit structure as hidden Markov models.
    Used to obtain P-GCD proposals by circuit multiplication.

pith-pipeline@v0.9.1-grok · 5762 in / 1192 out tokens · 24341 ms · 2026-06-28T14:22:31.816621+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

42 extracted references · 4 canonical work pages · 1 internal anchor

  1. [1]

    W., Hou, L., Longpre, S., Zoph, B., Tay, Y., Fedus, W., Li, Y., Wang, X., Dehghani, M., Brahma, S., Webson, A., Gu, S

    Chung, H. W., Hou, L., Longpre, S., Zoph, B., Tay, Y., Fedus, W., Li, Y., Wang, X., Dehghani, M., Brahma, S., Webson, A., Gu, S. S., Dai, Z., Suzgun, M., Chen, X., Chowdhery, A., Castro-Ros, A., Pellat, M., Robinson, K., Valter, D., Narang, S., Mishra, G., Yu, A., Zhao, V., Huang, Y., Dai, A., Yu, H., Petrov, S., Chi, E. H., Dean, J., Devlin, J., Roberts,...

  2. [2]

    Abs: Enforcing constraint satisfaction on generated sequences via automata-guided beam search

    Collura, V., Tit, K., Bussi, L., Giunchiglia, E., and Cordy, M. Abs: Enforcing constraint satisfaction on generated sequences via automata-guided beam search. arXiv preprint arXiv:2506.09701, 2025

  3. [3]

    F., Cai, Y., Xu, Z., Zhao, Y., Lai, R., and Chen, T

    Dong, Y., Ruan, C. F., Cai, Y., Xu, Z., Zhao, Y., Lai, R., and Chen, T. Xgrammar: Flexible and efficient structured generation engine for large language models. Proceedings of Machine Learning and Systems, 7, 2025

  4. [4]

    and Robson, E

    Evans, C. and Robson, E. W. automata: A python package for simulating and manipulating automata. Journal of Open Source Software, 2023

  5. [5]

    Gehman, S., Gururangan, S., Sap, M., Choi, Y., and Smith, N. A. R eal T oxicity P rompts: Evaluating neural toxic degeneration in language models. In Cohn, T., He, Y., and Liu, Y. (eds.), Findings of the Association for Computational Linguistics: EMNLP 2020, 2020

  6. [6]

    arXiv preprint arXiv:2501.10868 , year=

    Geng, S., Cooper, H., Moskal, M., Jenkins, S., Berman, J., Ranchin, N., West, R., Horvitz, E., and Nori, H. Jsonschemabench: A rigorous benchmark of structured outputs for language models. arXiv preprint arXiv:2501.10868, 2025

  7. [7]

    A., Vaidya, S., Park, K., Ji, R., Berg-Kirkpatrick, T., and D'Antoni, L

    Gonzalez, E. A., Vaidya, S., Park, K., Ji, R., Berg-Kirkpatrick, T., and D'Antoni, L. Constrained sampling for language models should be easy: An MCMC perspective. In The Thirty-ninth Annual Conference on Neural Information Processing Systems, 2026

  8. [8]

    E., Motwani, R., and Ullman, J

    Hopcroft, J. E., Motwani, R., and Ullman, J. D. Introduction to Automata Theory, Languages and Computability. Addison-Wesley Longman Publishing Co., Inc., 2000

  9. [9]

    J., Jain, M., Elmoznino, E., Kaddar, Y., Lajoie, G., Bengio, Y., and Malkin, N

    Hu, E. J., Jain, M., Elmoznino, E., Kaddar, Y., Lajoie, G., Bengio, Y., and Malkin, N. Amortizing intractable inference in large language models. In The Twelfth International Conference on Learning Representations, 2024

  10. [10]

    D., McCann, B., Keskar, N

    Krause, B., Gotmare, A. D., McCann, B., Keskar, N. S., Joty, S., Socher, R., and Rajani, N. F. Gedi: Generative discriminator guided sequence generation. In Findings of the Association for Computational Linguistics: EMNLP 2021, pp.\ 4929--4952, 2021

  11. [11]

    Spider 2.0: Evaluating language models on real-world enterprise text-to- SQL workflows

    Lei, F., Chen, J., Ye, Y., Cao, R., Shin, D., SU, H., SUO, Z., Gao, H., Hu, W., Yin, P., Zhong, V., Xiong, C., Sun, R., Liu, Q., Wang, S., and Yu, T. Spider 2.0: Evaluating language models on real-world enterprise text-to- SQL workflows. In The Thirteenth International Conference on Learning Representations, 2025

  12. [12]

    Correctness-guaranteed code generation via constrained decoding

    Li, L., salar rahili, and Zhao, Y. Correctness-guaranteed code generation via constrained decoding. In The Second Conference on Language Modeling, 2025

  13. [13]

    Y., Zhou, W., Shen, M., Zhou, P., Bhagavatula, C., Choi, Y., and Ren, X

    Lin, B. Y., Zhou, W., Shen, M., Zhou, P., Bhagavatula, C., Choi, Y., and Ren, X. C ommon G en: A constrained text generation challenge for generative commonsense reasoning. In Findings of the Association for Computational Linguistics: EMNLP 2020, pp.\ 1823--1840, Online, November 2020. Association for Computational Linguistics

  14. [14]

    H., Loula, J., MacIver, D

    Lipkin, B., LeBrun, B., Vigly, J. H., Loula, J., MacIver, D. R., Du, L., Eisner, J., Cotterell, R., Mansinghka, V., O'Donnell, T. J., Lew, A. K., and Vieira, T. Fast controlled generation from language models with adaptive weighted rejection sampling. In The Second Conference on Language Modeling, 2025

  15. [15]

    C., Wang, H., Heinecke, S., and Xiong, C

    Liu, Z., Hoang, T., Zhang, J., Zhu, M., Lan, T., Kokane, S., Tan, J., Yao, W., Liu, Z., Feng, Y., Murthy, R., Yang, L., Savarese, S., Niebles, J. C., Wang, H., Heinecke, S., and Xiong, C. Apigen: Automated pipeline for generating verifiable and diverse function-calling datasets. In Advances in Neural Information Processing Systems, 2024

  16. [16]

    What is the relationship between tensor factorizations and circuits (and how can we exploit it)? Transactions on Machine Learning Research, 2025, 2025

    Loconte, L., Mari, A., Gala, G., Peharz, R., de Campos, C., Quaeghebeur, E., Vessio, G., and Vergari, A. What is the relationship between tensor factorizations and circuits (and how can we exploit it)? Transactions on Machine Learning Research, 2025, 2025

  17. [17]

    K., Vieira, T., and O'Donnell, T

    Loula, J., LeBrun, B., Du, L., Lipkin, B., Pasti, C., Grand, G., Liu, T., Emara, Y., Freedman, M., Eisner, J., Cotterell, R., Mansinghka, V., Lew, A. K., Vieira, T., and O'Donnell, T. J. Syntactic and semantic control of large language models via sequential monte carlo. In The Thirteenth International Conference on Learning Representations, 2025

  18. [18]

    Bounding the capabilities of large language models in open text generation with prompt constraints

    Lu, A., Zhang, H., Zhang, Y., Wang, X., and Yang, D. Bounding the capabilities of large language models in open text generation with prompt constraints. In Findings of the Association for Computational Linguistics: EACL 2023, 2023

  19. [19]

    Controllable text generation with neurally-decomposed oracle

    Meng, T., Lu, S., Peng, N., and Chang, K.-W. Controllable text generation with neurally-decomposed oracle. In Advances in Neural Information Processing Systems 35, 2022

  20. [20]

    F., Leike, J., and Lowe, R

    Ouyang, L., Wu, J., Jiang, X., Almeida, D., Wainwright, C., Mishkin, P., Zhang, C., Agarwal, S., Slama, K., Ray, A., Schulman, J., Hilton, J., Kelton, F., Miller, L., Simens, M., Askell, A., Welinder, P., Christiano, P. F., Leike, J., and Lowe, R. Training language models to follow instructions with human feedback. In Advances in Neural Information Proces...

  21. [21]

    Bleu: a method for automatic evaluation of machine translation

    Papineni, K., Roukos, S., Ward, T., and Zhu, W.-J. Bleu: a method for automatic evaluation of machine translation. In Proceedings of the 40th annual meeting of the Association for Computational Linguistics (ACL), 2002

  22. [22]

    Grammar-aligned decoding

    Park, K., Wang, J., Berg-Kirkpatrick, T., Polikarpova, N., and D Antoni, L. Grammar-aligned decoding. In Advances in Neural Information Processing Systems, 2024

  23. [23]

    Flexible and efficient grammar-constrained decoding

    Park, K., Zhou, T., and D'Antoni, L. Flexible and efficient grammar-constrained decoding. arXiv preprint arXiv:2502.05111, 2025

  24. [24]

    G., Zhang, T., Wang, X., and Gonzalez, J

    Patil, S. G., Zhang, T., Wang, X., and Gonzalez, J. E. Gorilla: Large language model connected with massive apis. In Advances in Neural Information Processing Systems, 2024

  25. [25]

    G., Mao, H., Yan, F., Ji, C

    Patil, S. G., Mao, H., Yan, F., Ji, C. C.-J., Suresh, V., Stoica, I., and Gonzalez, J. E. The berkeley function calling leaderboard ( BFCL ): From tool use to agentic evaluation of large language models. In Forty-second International Conference on Machine Learning, 2025

  26. [26]

    Synchromesh: Reliable code generation from pre-trained language models

    Poesia, G., Polozov, A., Le, V., Tiwari, A., Soares, G., Meek, C., and Gulwani, S. Synchromesh: Reliable code generation from pre-trained language models. In The Eleventh International Conference on Learning Representations, 2022

  27. [27]

    Cold decoding: Energy-based constrained text generation with langevin dynamics

    Qin, L., Welleck, S., Khashabi, D., and Choi, Y. Cold decoding: Energy-based constrained text generation with langevin dynamics. In Advances in Neural Information Processing Systems, 2022

  28. [28]

    and Juang, B

    Rabiner, L. and Juang, B. An introduction to hidden markov models. IEEE ASSP Magazine, 3 0 (1), 1986

  29. [29]

    D., Ermon, S., and Finn, C

    Rafailov, R., Sharma, A., Mitchell, E., Manning, C. D., Ermon, S., and Finn, C. Direct preference optimization: Your language model is secretly a reward model. In Advances in Neural Information Processing Systems, 2023

  30. [30]

    PICARD : Parsing incrementally for constrained auto-regressive decoding from language models

    Scholak, T., Schucher, N., and Bahdanau, D. PICARD : Parsing incrementally for constrained auto-regressive decoding from language models. In Proceedings of the Conference on Empirical Methods in Natural Language Processing, 2021

  31. [31]

    Tractable operations for arithmetic circuits of probabilistic models

    Shen, Y., Choi, A., and Darwiche, A. Tractable operations for arithmetic circuits of probabilistic models. Advances in Neural Information Processing Systems, 29, 2016

  32. [32]

    A., Pauls, A., Klein, D., Eisner, J., and Van Durme, B

    Shin, R., Lin, C., Thomson, S., Chen, C., Roy, S., Platanios, E. A., Pauls, A., Klein, D., Eisner, J., and Van Durme, B. Constrained language models yield few-shot semantic parsers. In Proceedings of the Conference on Empirical Methods in Natural Language Processing, 2021

  33. [33]

    Syncode: LLM generation with grammar augmentation

    Ugare, S., Suresh, T., Kang, H., Misailovic, S., and Singh, G. Syncode: LLM generation with grammar augmentation. Transactions on Machine Learning Research, 2025

  34. [34]

    A compositional atlas of tractable circuit operations for probabilistic inference

    Vergari, A., Choi, Y., Liu, A., Teso, S., and Van den Broeck, G. A compositional atlas of tractable circuit operations for probabilistic inference. Advances in Neural Information Processing Systems, 34: 0 13189--13201, 2021

  35. [35]

    W., Lester, B., Du, N., Dai, A

    Wei, J., Bosma, M., Zhao, V., Guu, K., Yu, A. W., Lester, B., Du, N., Dai, A. M., and Le, Q. V. Finetuned language models are zero-shot learners. In The Eleventh International Conference on Learning Representations, 2022

  36. [36]

    Willard, B. T. and Louf, R. Efficient guided generation for large language models. arXiv e-prints, pp.\ arXiv--2307, 2023

  37. [37]

    and Klein, D

    Yang, K. and Klein, D. Fudge: Controlled text generation with future discriminators. In Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL), 2021

  38. [38]

    $\tau$-bench: A Benchmark for Tool-Agent-User Interaction in Real-World Domains

    Yao, S., Shinn, N., Razavi, P., and Narasimhan, K. -bench: A benchmark for tool-agent-user interaction in real-world domains. arXiv preprint arXiv:2406.12045, 2024

  39. [39]

    S pider: A large-scale human-labeled dataset for complex and cross-domain semantic parsing and text-to- SQL task

    Yu, T., Zhang, R., Yang, K., Yasunaga, M., Wang, D., Li, Z., Ma, J., Li, I., Yao, Q., Roman, S., Zhang, Z., and Radev, D. S pider: A large-scale human-labeled dataset for complex and cross-domain semantic parsing and text-to- SQL task. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, 2018

  40. [40]

    Tractable control for autoregressive language generation

    Zhang, H., Dang, M., Peng, N., and Van Den Broeck, G. Tractable control for autoregressive language generation. In The Fortieth International Conference on Machine Learning, 2023

  41. [41]

    Adaptable logical control for large language models

    Zhang, H., Kung, P.-N., Yoshida, M., Van den Broeck, G., and Peng, N. Adaptable logical control for large language models. In Advances in Neural Information Processing Systems, 2024

  42. [42]

    Zhang, H., Dang, M., Wang, B., Ermon, S., Peng, N., and den Broeck, G. V. Scaling probabilistic circuits via monarch matrices. In The Forty-second International Conference on Machine Learning, 2025