Mitigating Bias in Locally Constrained Decoding via Tractable Proposals

Guy Van den Broeck; Honghua Zhang; Jieyu Zhao; Linxin Song; Meihua Dang; Stefano Ermon

arxiv: 2606.01926 · v1 · pith:OX6CCUL6new · submitted 2026-06-01 · 💻 cs.CL

Mitigating Bias in Locally Constrained Decoding via Tractable Proposals

Meihua Dang , Linxin Song , Honghua Zhang , Jieyu Zhao , Guy Van den Broeck , Stefano Ermon This is my paper

Pith reviewed 2026-06-28 14:22 UTC · model grok-4.3

classification 💻 cs.CL

keywords constrained decodingsequential Monte Carlofinite automatalanguage modelsproposal distributionbias mitigationglobally constrained decodingSQL generation

0 comments

The pith

Tensorized finite automata yield proposals that let SMC sampling converge to constrained language model distributions faster and with fewer particles than local masking.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Large language models frequently violate output constraints such as JSON schemas or valid SQL. Local constrained decoding corrects this by masking invalid next tokens at each step, but the myopic choice introduces bias into the generated distribution. The paper represents constraints as finite automata, tensors them for GPU execution, and uses the resulting structure to build globally constrained decoding proposals. These proposals, along with probabilistic variants obtained by circuit multiplication with hidden Markov models, serve as better proposal distributions inside sequential Monte Carlo sampling. Experiments on function calling, keyword generation, and SQL tasks show that the new proposals reach the target distribution with substantially fewer particles under an otherwise identical SMC setup.

Core claim

Constraints expressed as finite automata can be tensorized for efficient GPU execution and share the same circuit structure as hidden Markov models. This structure supports the construction of globally constrained decoding (GCD) proposals and probabilistic GCD (P-GCD) proposals that incorporate both logical constraints and probabilistic information from the language model. Under the same sequential Monte Carlo sampling setup, (P-)GCD proposals converge to the target distribution p_lm(· | constraint) faster and require significantly fewer particles than locally constrained decoding proposals.

What carries the argument

Tensorized finite automaton representation of constraints, which permits GPU-efficient global proposals and circuit multiplication with hidden Markov models to form GCD and P-GCD proposals for SMC.

If this is right

Fewer particles suffice to obtain unbiased samples from language models under finite-automaton constraints.
The same SMC setup produces higher-quality function-calling and SQL outputs when GCD or P-GCD proposals replace local masking.
Bias from myopic token masking is reduced without changing the underlying language model or the SMC resampling schedule.
Tensorization enables the proposals to run at scale on modern hardware while preserving exact constraint satisfaction.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same tensorized-automaton technique could be applied to other sampling algorithms that currently rely on local proposals.
If constraints outside regular languages can be approximated by finite automata, the efficiency gain may extend to broader classes of structured generation tasks.
Reducing the particle requirement lowers memory and latency costs for deploying constrained generation in production settings.

Load-bearing premise

Constraints can be specified as finite automata that admit tensorization for GPU execution and share circuit structure with hidden Markov models.

What would settle it

Run SMC with a fixed particle budget on a held-out SQL generation task and measure the KL divergence or total variation distance between the empirical output distribution and the true target; the claim is falsified if LCD reaches the same distance with equal or fewer particles than GCD.

Figures

Figures reproduced from arXiv: 2606.01926 by Guy Van den Broeck, Honghua Zhang, Jieyu Zhao, Linxin Song, Meihua Dang, Stefano Ermon.

**Figure 1.** Figure 1: An illustration of an NFA corresponding to a JSON schema object of the form {"x": <integer>}, together with its tensorized representation. It consists of a transition matrix Tout and an emission matrix Bf, which encode state transitions and symbol emissions, respectively. For simplicity, the incoming transition matrix Tin is omitted in the figure. specified as finite automata, and (ii) probabilistic globa… view at source ↗

**Figure 2.** Figure 2: Accuracy (%) on xLAM as a function of the number of SMC particles comparing proposals LCD, GCD, and P-GCD. GCD, illustrating the benefit of the additional probabilistic information provided by the HMM. We additionally report constraint satisfaction rates at fixed particle budgets in Table 1. LCD can fail to satisfy the constraint even at moderate particle budgets, partially explaining its lower accuracy c… view at source ↗

**Figure 3.** Figure 3: BLEU-4 on CommonGen as a function of the number of SMC particles. LCD and GCD configurations use the LM likelihood plm as the potential, with proposals qlcd and qgcd respectively; ϕ-GCD uses qpgcd as the proposal and ϕpgcd as the potential. from the proposal distribution with temperature 1.0, we report the expected BLEU score, weighted over particles produced by SMC sampling. Experiments and Results. In … view at source ↗

**Figure 4.** Figure 4: NFA and DFA state counts as a function of required set, illustrating the exponential succinctness gap. (NFAs) and deterministic FAs (DFAs). In our experiments, xLAM and CommonGen use DFAs, while text-to-SQL uses NFAs. Here we present a more specific case study illustrating that NFAs can be exponentially more succinct than DFAs. Continuing with the function-calling task, we consider the following definiti… view at source ↗

read the original abstract

Generations from large language models often fail to conform to desired constraints such as JSON schema. Existing locally constrained decoding (LCD) approaches enforce constraints by myopically masking out next tokens, resulting in biased sampling and degradation in performance. Recent work uses sequential Monte Carlo (SMC) methods to mitigate such biases, but designing effective proposal distributions or potential functions remains a key challenge. In this work, we propose a generic approach to construct proposals and potentials for SMC sampling from $p_{\mathrm{lm}}( \cdot \mid \mathrm{constraint})$. First, we show that constraints specified as finite automata can be tensorized for efficient execution on GPUs, which we use to construct globally constrained decoding (GCD) proposals. In addition, leveraging the fact that tensorized finite automata share the same circuit structure as hidden Markov models, we circuit-multiply them to obtain the probabilistic GCD (P-GCD) proposals encoding both logical and probabilistic information about the target distributions. We evaluate (P-)GCD on the tasks of function calling, keyword-based generation, and SQL generation. Experiments show that under the same SMC sampling setup, compared to LCD proposals, (P-)GCD converges faster to the target distribution with significantly fewer particles.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper gives a practical way to build globally valid proposals for SMC in constrained decoding by tensorizing automata and multiplying circuits with HMMs.

read the letter

The core contribution is turning finite automata constraints into tensorized forms that run efficiently on GPU and then using circuit multiplication to fold in probabilistic information from the language model. This produces GCD and P-GCD proposals that target the constrained distribution more directly than the usual local masking used in LCD. The approach is generic across any constraint expressible as an automaton, which covers the JSON, keyword, and SQL cases they test.

What stands out is the explicit link to hidden Markov model circuit structure, which lets them keep the proposal both valid and cheap to evaluate. The experiments report faster convergence to the target under the same SMC setup and with fewer particles, which matches the claim that local proposals introduce bias that global ones avoid.

The main limitation visible from the abstract is that the derivation of the tensorization and the exact form of the potential functions are not shown here; a reader would need the full proofs and pseudocode to judge how general the construction really is and whether the HMM multiplication preserves the right marginals. The tasks are standard but narrow, so it is not yet clear how the method scales when the automaton grows or when the constraint is more complex than the examples.

This is the kind of paper that constrained-decoding and SMC people will want to read. It supplies a concrete, implementable idea rather than another high-level observation about bias. I would send it to review; the central construction looks worth checking in detail even if the empirical gains turn out to be task-dependent.

Referee Report

1 major / 1 minor

Summary. The paper proposes a generic approach to construct proposals and potentials for SMC sampling from p_lm(· | constraint). Constraints specified as finite automata are tensorized for efficient GPU execution to build globally constrained decoding (GCD) proposals; these are further circuit-multiplied with hidden Markov models to obtain probabilistic GCD (P-GCD) proposals that encode both logical and probabilistic information. Experiments on function calling, keyword-based generation, and SQL generation show that, under the same SMC sampling setup, (P-)GCD converges faster to the target distribution with significantly fewer particles than LCD proposals.

Significance. If the empirical results hold, the work offers a tractable, GPU-efficient method for constructing effective proposals that incorporate both constraints and model probabilities, directly addressing bias in locally constrained decoding. The tensorization of automata and the HMM circuit-multiplication insight are practical strengths that could generalize beyond the evaluated tasks.

major comments (1)

[Abstract] Abstract: the central claim that (P-)GCD converges faster with significantly fewer particles is presented without any quantitative metrics, particle counts, convergence curves, statistical tests, or error analysis, so the magnitude and reliability of the improvement cannot be assessed from the provided text.

minor comments (1)

[Abstract] Abstract: the notation p_lm(· | constraint) appears without an explicit definition or reference to its construction.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the positive assessment of the work's significance and for the constructive comment on the abstract. We agree that the abstract would be strengthened by including quantitative details and will revise it accordingly in the resubmission.

read point-by-point responses

Referee: [Abstract] Abstract: the central claim that (P-)GCD converges faster with significantly fewer particles is presented without any quantitative metrics, particle counts, convergence curves, statistical tests, or error analysis, so the magnitude and reliability of the improvement cannot be assessed from the provided text.

Authors: We agree with this observation. The current abstract summarizes the empirical findings at a high level without specific numbers. In the revised manuscript we will update the abstract to report key quantitative results from Section 5, including: convergence achieved with 8-32 particles for (P-)GCD versus 128-512 for LCD baselines on the three tasks; relative effective sample size improvements; and references to the convergence plots (Figures 3-5) that include error bars and statistical comparisons. These metrics are already present in the experimental section with full details on particle counts, wall-clock times, and variance estimates; the revision will simply surface representative values in the abstract itself. revision: yes

Circularity Check

0 steps flagged

No significant circularity detected

full rationale

The derivation constructs GCD proposals by tensorizing finite automata for GPU execution and P-GCD by circuit multiplication with HMM structures. These steps are presented as following directly from automata properties and shared circuit structure with HMMs, without any reduction to fitted parameters, self-definitions, or load-bearing self-citations. The empirical claim of faster SMC convergence is evaluated on downstream tasks and does not rely on renaming or importing uniqueness from prior author work. The paper is self-contained against external benchmarks with no quoted steps that collapse by construction.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The central claim rests on the domain assumption that finite automata constraints admit efficient tensorization and share circuit structure with HMMs; no free parameters or invented entities are mentioned in the abstract.

axioms (2)

domain assumption Constraints specified as finite automata can be tensorized for efficient execution on GPUs.
Invoked to construct GCD proposals.
domain assumption Tensorized finite automata share the same circuit structure as hidden Markov models.
Used to obtain P-GCD proposals by circuit multiplication.

pith-pipeline@v0.9.1-grok · 5762 in / 1192 out tokens · 24341 ms · 2026-06-28T14:22:31.816621+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

42 extracted references · 4 canonical work pages · 1 internal anchor

[1]

W., Hou, L., Longpre, S., Zoph, B., Tay, Y., Fedus, W., Li, Y., Wang, X., Dehghani, M., Brahma, S., Webson, A., Gu, S

Chung, H. W., Hou, L., Longpre, S., Zoph, B., Tay, Y., Fedus, W., Li, Y., Wang, X., Dehghani, M., Brahma, S., Webson, A., Gu, S. S., Dai, Z., Suzgun, M., Chen, X., Chowdhery, A., Castro-Ros, A., Pellat, M., Robinson, K., Valter, D., Narang, S., Mishra, G., Yu, A., Zhao, V., Huang, Y., Dai, A., Yu, H., Petrov, S., Chi, E. H., Dean, J., Devlin, J., Roberts,...

2024
[2]

Abs: Enforcing constraint satisfaction on generated sequences via automata-guided beam search

Collura, V., Tit, K., Bussi, L., Giunchiglia, E., and Cordy, M. Abs: Enforcing constraint satisfaction on generated sequences via automata-guided beam search. arXiv preprint arXiv:2506.09701, 2025

work page arXiv 2025
[3]

F., Cai, Y., Xu, Z., Zhao, Y., Lai, R., and Chen, T

Dong, Y., Ruan, C. F., Cai, Y., Xu, Z., Zhao, Y., Lai, R., and Chen, T. Xgrammar: Flexible and efficient structured generation engine for large language models. Proceedings of Machine Learning and Systems, 7, 2025

2025
[4]

and Robson, E

Evans, C. and Robson, E. W. automata: A python package for simulating and manipulating automata. Journal of Open Source Software, 2023

2023
[5]

Gehman, S., Gururangan, S., Sap, M., Choi, Y., and Smith, N. A. R eal T oxicity P rompts: Evaluating neural toxic degeneration in language models. In Cohn, T., He, Y., and Liu, Y. (eds.), Findings of the Association for Computational Linguistics: EMNLP 2020, 2020

2020
[6]

arXiv preprint arXiv:2501.10868 , year=

Geng, S., Cooper, H., Moskal, M., Jenkins, S., Berman, J., Ranchin, N., West, R., Horvitz, E., and Nori, H. Jsonschemabench: A rigorous benchmark of structured outputs for language models. arXiv preprint arXiv:2501.10868, 2025

work page arXiv 2025
[7]

A., Vaidya, S., Park, K., Ji, R., Berg-Kirkpatrick, T., and D'Antoni, L

Gonzalez, E. A., Vaidya, S., Park, K., Ji, R., Berg-Kirkpatrick, T., and D'Antoni, L. Constrained sampling for language models should be easy: An MCMC perspective. In The Thirty-ninth Annual Conference on Neural Information Processing Systems, 2026

2026
[8]

E., Motwani, R., and Ullman, J

Hopcroft, J. E., Motwani, R., and Ullman, J. D. Introduction to Automata Theory, Languages and Computability. Addison-Wesley Longman Publishing Co., Inc., 2000

2000
[9]

J., Jain, M., Elmoznino, E., Kaddar, Y., Lajoie, G., Bengio, Y., and Malkin, N

Hu, E. J., Jain, M., Elmoznino, E., Kaddar, Y., Lajoie, G., Bengio, Y., and Malkin, N. Amortizing intractable inference in large language models. In The Twelfth International Conference on Learning Representations, 2024

2024
[10]

D., McCann, B., Keskar, N

Krause, B., Gotmare, A. D., McCann, B., Keskar, N. S., Joty, S., Socher, R., and Rajani, N. F. Gedi: Generative discriminator guided sequence generation. In Findings of the Association for Computational Linguistics: EMNLP 2021, pp.\ 4929--4952, 2021

2021
[11]

Spider 2.0: Evaluating language models on real-world enterprise text-to- SQL workflows

Lei, F., Chen, J., Ye, Y., Cao, R., Shin, D., SU, H., SUO, Z., Gao, H., Hu, W., Yin, P., Zhong, V., Xiong, C., Sun, R., Liu, Q., Wang, S., and Yu, T. Spider 2.0: Evaluating language models on real-world enterprise text-to- SQL workflows. In The Thirteenth International Conference on Learning Representations, 2025

2025
[12]

Correctness-guaranteed code generation via constrained decoding

Li, L., salar rahili, and Zhao, Y. Correctness-guaranteed code generation via constrained decoding. In The Second Conference on Language Modeling, 2025

2025
[13]

Y., Zhou, W., Shen, M., Zhou, P., Bhagavatula, C., Choi, Y., and Ren, X

Lin, B. Y., Zhou, W., Shen, M., Zhou, P., Bhagavatula, C., Choi, Y., and Ren, X. C ommon G en: A constrained text generation challenge for generative commonsense reasoning. In Findings of the Association for Computational Linguistics: EMNLP 2020, pp.\ 1823--1840, Online, November 2020. Association for Computational Linguistics

2020
[14]

H., Loula, J., MacIver, D

Lipkin, B., LeBrun, B., Vigly, J. H., Loula, J., MacIver, D. R., Du, L., Eisner, J., Cotterell, R., Mansinghka, V., O'Donnell, T. J., Lew, A. K., and Vieira, T. Fast controlled generation from language models with adaptive weighted rejection sampling. In The Second Conference on Language Modeling, 2025

2025
[15]

C., Wang, H., Heinecke, S., and Xiong, C

Liu, Z., Hoang, T., Zhang, J., Zhu, M., Lan, T., Kokane, S., Tan, J., Yao, W., Liu, Z., Feng, Y., Murthy, R., Yang, L., Savarese, S., Niebles, J. C., Wang, H., Heinecke, S., and Xiong, C. Apigen: Automated pipeline for generating verifiable and diverse function-calling datasets. In Advances in Neural Information Processing Systems, 2024

2024
[16]

What is the relationship between tensor factorizations and circuits (and how can we exploit it)? Transactions on Machine Learning Research, 2025, 2025

Loconte, L., Mari, A., Gala, G., Peharz, R., de Campos, C., Quaeghebeur, E., Vessio, G., and Vergari, A. What is the relationship between tensor factorizations and circuits (and how can we exploit it)? Transactions on Machine Learning Research, 2025, 2025

2025
[17]

K., Vieira, T., and O'Donnell, T

Loula, J., LeBrun, B., Du, L., Lipkin, B., Pasti, C., Grand, G., Liu, T., Emara, Y., Freedman, M., Eisner, J., Cotterell, R., Mansinghka, V., Lew, A. K., Vieira, T., and O'Donnell, T. J. Syntactic and semantic control of large language models via sequential monte carlo. In The Thirteenth International Conference on Learning Representations, 2025

2025
[18]

Bounding the capabilities of large language models in open text generation with prompt constraints

Lu, A., Zhang, H., Zhang, Y., Wang, X., and Yang, D. Bounding the capabilities of large language models in open text generation with prompt constraints. In Findings of the Association for Computational Linguistics: EACL 2023, 2023

2023
[19]

Controllable text generation with neurally-decomposed oracle

Meng, T., Lu, S., Peng, N., and Chang, K.-W. Controllable text generation with neurally-decomposed oracle. In Advances in Neural Information Processing Systems 35, 2022

2022
[20]

F., Leike, J., and Lowe, R

Ouyang, L., Wu, J., Jiang, X., Almeida, D., Wainwright, C., Mishkin, P., Zhang, C., Agarwal, S., Slama, K., Ray, A., Schulman, J., Hilton, J., Kelton, F., Miller, L., Simens, M., Askell, A., Welinder, P., Christiano, P. F., Leike, J., and Lowe, R. Training language models to follow instructions with human feedback. In Advances in Neural Information Proces...

2022
[21]

Bleu: a method for automatic evaluation of machine translation

Papineni, K., Roukos, S., Ward, T., and Zhu, W.-J. Bleu: a method for automatic evaluation of machine translation. In Proceedings of the 40th annual meeting of the Association for Computational Linguistics (ACL), 2002

2002
[22]

Grammar-aligned decoding

Park, K., Wang, J., Berg-Kirkpatrick, T., Polikarpova, N., and D Antoni, L. Grammar-aligned decoding. In Advances in Neural Information Processing Systems, 2024

2024
[23]

Flexible and efficient grammar-constrained decoding

Park, K., Zhou, T., and D'Antoni, L. Flexible and efficient grammar-constrained decoding. arXiv preprint arXiv:2502.05111, 2025

work page arXiv 2025
[24]

G., Zhang, T., Wang, X., and Gonzalez, J

Patil, S. G., Zhang, T., Wang, X., and Gonzalez, J. E. Gorilla: Large language model connected with massive apis. In Advances in Neural Information Processing Systems, 2024

2024
[25]

G., Mao, H., Yan, F., Ji, C

Patil, S. G., Mao, H., Yan, F., Ji, C. C.-J., Suresh, V., Stoica, I., and Gonzalez, J. E. The berkeley function calling leaderboard ( BFCL ): From tool use to agentic evaluation of large language models. In Forty-second International Conference on Machine Learning, 2025

2025
[26]

Synchromesh: Reliable code generation from pre-trained language models

Poesia, G., Polozov, A., Le, V., Tiwari, A., Soares, G., Meek, C., and Gulwani, S. Synchromesh: Reliable code generation from pre-trained language models. In The Eleventh International Conference on Learning Representations, 2022

2022
[27]

Cold decoding: Energy-based constrained text generation with langevin dynamics

Qin, L., Welleck, S., Khashabi, D., and Choi, Y. Cold decoding: Energy-based constrained text generation with langevin dynamics. In Advances in Neural Information Processing Systems, 2022

2022
[28]

and Juang, B

Rabiner, L. and Juang, B. An introduction to hidden markov models. IEEE ASSP Magazine, 3 0 (1), 1986

1986
[29]

D., Ermon, S., and Finn, C

Rafailov, R., Sharma, A., Mitchell, E., Manning, C. D., Ermon, S., and Finn, C. Direct preference optimization: Your language model is secretly a reward model. In Advances in Neural Information Processing Systems, 2023

2023
[30]

PICARD : Parsing incrementally for constrained auto-regressive decoding from language models

Scholak, T., Schucher, N., and Bahdanau, D. PICARD : Parsing incrementally for constrained auto-regressive decoding from language models. In Proceedings of the Conference on Empirical Methods in Natural Language Processing, 2021

2021
[31]

Tractable operations for arithmetic circuits of probabilistic models

Shen, Y., Choi, A., and Darwiche, A. Tractable operations for arithmetic circuits of probabilistic models. Advances in Neural Information Processing Systems, 29, 2016

2016
[32]

A., Pauls, A., Klein, D., Eisner, J., and Van Durme, B

Shin, R., Lin, C., Thomson, S., Chen, C., Roy, S., Platanios, E. A., Pauls, A., Klein, D., Eisner, J., and Van Durme, B. Constrained language models yield few-shot semantic parsers. In Proceedings of the Conference on Empirical Methods in Natural Language Processing, 2021

2021
[33]

Syncode: LLM generation with grammar augmentation

Ugare, S., Suresh, T., Kang, H., Misailovic, S., and Singh, G. Syncode: LLM generation with grammar augmentation. Transactions on Machine Learning Research, 2025

2025
[34]

A compositional atlas of tractable circuit operations for probabilistic inference

Vergari, A., Choi, Y., Liu, A., Teso, S., and Van den Broeck, G. A compositional atlas of tractable circuit operations for probabilistic inference. Advances in Neural Information Processing Systems, 34: 0 13189--13201, 2021

2021
[35]

W., Lester, B., Du, N., Dai, A

Wei, J., Bosma, M., Zhao, V., Guu, K., Yu, A. W., Lester, B., Du, N., Dai, A. M., and Le, Q. V. Finetuned language models are zero-shot learners. In The Eleventh International Conference on Learning Representations, 2022

2022
[36]

Willard, B. T. and Louf, R. Efficient guided generation for large language models. arXiv e-prints, pp.\ arXiv--2307, 2023

2023
[37]

and Klein, D

Yang, K. and Klein, D. Fudge: Controlled text generation with future discriminators. In Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL), 2021

2021
[38]

$\tau$-bench: A Benchmark for Tool-Agent-User Interaction in Real-World Domains

Yao, S., Shinn, N., Razavi, P., and Narasimhan, K. -bench: A benchmark for tool-agent-user interaction in real-world domains. arXiv preprint arXiv:2406.12045, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024
[39]

S pider: A large-scale human-labeled dataset for complex and cross-domain semantic parsing and text-to- SQL task

Yu, T., Zhang, R., Yang, K., Yasunaga, M., Wang, D., Li, Z., Ma, J., Li, I., Yao, Q., Roman, S., Zhang, Z., and Radev, D. S pider: A large-scale human-labeled dataset for complex and cross-domain semantic parsing and text-to- SQL task. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, 2018

2018
[40]

Tractable control for autoregressive language generation

Zhang, H., Dang, M., Peng, N., and Van Den Broeck, G. Tractable control for autoregressive language generation. In The Fortieth International Conference on Machine Learning, 2023

2023
[41]

Adaptable logical control for large language models

Zhang, H., Kung, P.-N., Yoshida, M., Van den Broeck, G., and Peng, N. Adaptable logical control for large language models. In Advances in Neural Information Processing Systems, 2024

2024
[42]

Zhang, H., Dang, M., Wang, B., Ermon, S., Peng, N., and den Broeck, G. V. Scaling probabilistic circuits via monarch matrices. In The Forty-second International Conference on Machine Learning, 2025

2025

[1] [1]

W., Hou, L., Longpre, S., Zoph, B., Tay, Y., Fedus, W., Li, Y., Wang, X., Dehghani, M., Brahma, S., Webson, A., Gu, S

Chung, H. W., Hou, L., Longpre, S., Zoph, B., Tay, Y., Fedus, W., Li, Y., Wang, X., Dehghani, M., Brahma, S., Webson, A., Gu, S. S., Dai, Z., Suzgun, M., Chen, X., Chowdhery, A., Castro-Ros, A., Pellat, M., Robinson, K., Valter, D., Narang, S., Mishra, G., Yu, A., Zhao, V., Huang, Y., Dai, A., Yu, H., Petrov, S., Chi, E. H., Dean, J., Devlin, J., Roberts,...

2024

[2] [2]

Abs: Enforcing constraint satisfaction on generated sequences via automata-guided beam search

Collura, V., Tit, K., Bussi, L., Giunchiglia, E., and Cordy, M. Abs: Enforcing constraint satisfaction on generated sequences via automata-guided beam search. arXiv preprint arXiv:2506.09701, 2025

work page arXiv 2025

[3] [3]

F., Cai, Y., Xu, Z., Zhao, Y., Lai, R., and Chen, T

Dong, Y., Ruan, C. F., Cai, Y., Xu, Z., Zhao, Y., Lai, R., and Chen, T. Xgrammar: Flexible and efficient structured generation engine for large language models. Proceedings of Machine Learning and Systems, 7, 2025

2025

[4] [4]

and Robson, E

Evans, C. and Robson, E. W. automata: A python package for simulating and manipulating automata. Journal of Open Source Software, 2023

2023

[5] [5]

Gehman, S., Gururangan, S., Sap, M., Choi, Y., and Smith, N. A. R eal T oxicity P rompts: Evaluating neural toxic degeneration in language models. In Cohn, T., He, Y., and Liu, Y. (eds.), Findings of the Association for Computational Linguistics: EMNLP 2020, 2020

2020

[6] [6]

arXiv preprint arXiv:2501.10868 , year=

Geng, S., Cooper, H., Moskal, M., Jenkins, S., Berman, J., Ranchin, N., West, R., Horvitz, E., and Nori, H. Jsonschemabench: A rigorous benchmark of structured outputs for language models. arXiv preprint arXiv:2501.10868, 2025

work page arXiv 2025

[7] [7]

A., Vaidya, S., Park, K., Ji, R., Berg-Kirkpatrick, T., and D'Antoni, L

Gonzalez, E. A., Vaidya, S., Park, K., Ji, R., Berg-Kirkpatrick, T., and D'Antoni, L. Constrained sampling for language models should be easy: An MCMC perspective. In The Thirty-ninth Annual Conference on Neural Information Processing Systems, 2026

2026

[8] [8]

E., Motwani, R., and Ullman, J

Hopcroft, J. E., Motwani, R., and Ullman, J. D. Introduction to Automata Theory, Languages and Computability. Addison-Wesley Longman Publishing Co., Inc., 2000

2000

[9] [9]

J., Jain, M., Elmoznino, E., Kaddar, Y., Lajoie, G., Bengio, Y., and Malkin, N

Hu, E. J., Jain, M., Elmoznino, E., Kaddar, Y., Lajoie, G., Bengio, Y., and Malkin, N. Amortizing intractable inference in large language models. In The Twelfth International Conference on Learning Representations, 2024

2024

[10] [10]

D., McCann, B., Keskar, N

Krause, B., Gotmare, A. D., McCann, B., Keskar, N. S., Joty, S., Socher, R., and Rajani, N. F. Gedi: Generative discriminator guided sequence generation. In Findings of the Association for Computational Linguistics: EMNLP 2021, pp.\ 4929--4952, 2021

2021

[11] [11]

Spider 2.0: Evaluating language models on real-world enterprise text-to- SQL workflows

Lei, F., Chen, J., Ye, Y., Cao, R., Shin, D., SU, H., SUO, Z., Gao, H., Hu, W., Yin, P., Zhong, V., Xiong, C., Sun, R., Liu, Q., Wang, S., and Yu, T. Spider 2.0: Evaluating language models on real-world enterprise text-to- SQL workflows. In The Thirteenth International Conference on Learning Representations, 2025

2025

[12] [12]

Correctness-guaranteed code generation via constrained decoding

Li, L., salar rahili, and Zhao, Y. Correctness-guaranteed code generation via constrained decoding. In The Second Conference on Language Modeling, 2025

2025

[13] [13]

Y., Zhou, W., Shen, M., Zhou, P., Bhagavatula, C., Choi, Y., and Ren, X

Lin, B. Y., Zhou, W., Shen, M., Zhou, P., Bhagavatula, C., Choi, Y., and Ren, X. C ommon G en: A constrained text generation challenge for generative commonsense reasoning. In Findings of the Association for Computational Linguistics: EMNLP 2020, pp.\ 1823--1840, Online, November 2020. Association for Computational Linguistics

2020

[14] [14]

H., Loula, J., MacIver, D

Lipkin, B., LeBrun, B., Vigly, J. H., Loula, J., MacIver, D. R., Du, L., Eisner, J., Cotterell, R., Mansinghka, V., O'Donnell, T. J., Lew, A. K., and Vieira, T. Fast controlled generation from language models with adaptive weighted rejection sampling. In The Second Conference on Language Modeling, 2025

2025

[15] [15]

C., Wang, H., Heinecke, S., and Xiong, C

Liu, Z., Hoang, T., Zhang, J., Zhu, M., Lan, T., Kokane, S., Tan, J., Yao, W., Liu, Z., Feng, Y., Murthy, R., Yang, L., Savarese, S., Niebles, J. C., Wang, H., Heinecke, S., and Xiong, C. Apigen: Automated pipeline for generating verifiable and diverse function-calling datasets. In Advances in Neural Information Processing Systems, 2024

2024

[16] [16]

What is the relationship between tensor factorizations and circuits (and how can we exploit it)? Transactions on Machine Learning Research, 2025, 2025

Loconte, L., Mari, A., Gala, G., Peharz, R., de Campos, C., Quaeghebeur, E., Vessio, G., and Vergari, A. What is the relationship between tensor factorizations and circuits (and how can we exploit it)? Transactions on Machine Learning Research, 2025, 2025

2025

[17] [17]

K., Vieira, T., and O'Donnell, T

Loula, J., LeBrun, B., Du, L., Lipkin, B., Pasti, C., Grand, G., Liu, T., Emara, Y., Freedman, M., Eisner, J., Cotterell, R., Mansinghka, V., Lew, A. K., Vieira, T., and O'Donnell, T. J. Syntactic and semantic control of large language models via sequential monte carlo. In The Thirteenth International Conference on Learning Representations, 2025

2025

[18] [18]

Bounding the capabilities of large language models in open text generation with prompt constraints

Lu, A., Zhang, H., Zhang, Y., Wang, X., and Yang, D. Bounding the capabilities of large language models in open text generation with prompt constraints. In Findings of the Association for Computational Linguistics: EACL 2023, 2023

2023

[19] [19]

Controllable text generation with neurally-decomposed oracle

Meng, T., Lu, S., Peng, N., and Chang, K.-W. Controllable text generation with neurally-decomposed oracle. In Advances in Neural Information Processing Systems 35, 2022

2022

[20] [20]

F., Leike, J., and Lowe, R

Ouyang, L., Wu, J., Jiang, X., Almeida, D., Wainwright, C., Mishkin, P., Zhang, C., Agarwal, S., Slama, K., Ray, A., Schulman, J., Hilton, J., Kelton, F., Miller, L., Simens, M., Askell, A., Welinder, P., Christiano, P. F., Leike, J., and Lowe, R. Training language models to follow instructions with human feedback. In Advances in Neural Information Proces...

2022

[21] [21]

Bleu: a method for automatic evaluation of machine translation

Papineni, K., Roukos, S., Ward, T., and Zhu, W.-J. Bleu: a method for automatic evaluation of machine translation. In Proceedings of the 40th annual meeting of the Association for Computational Linguistics (ACL), 2002

2002

[22] [22]

Grammar-aligned decoding

Park, K., Wang, J., Berg-Kirkpatrick, T., Polikarpova, N., and D Antoni, L. Grammar-aligned decoding. In Advances in Neural Information Processing Systems, 2024

2024

[23] [23]

Flexible and efficient grammar-constrained decoding

Park, K., Zhou, T., and D'Antoni, L. Flexible and efficient grammar-constrained decoding. arXiv preprint arXiv:2502.05111, 2025

work page arXiv 2025

[24] [24]

G., Zhang, T., Wang, X., and Gonzalez, J

Patil, S. G., Zhang, T., Wang, X., and Gonzalez, J. E. Gorilla: Large language model connected with massive apis. In Advances in Neural Information Processing Systems, 2024

2024

[25] [25]

G., Mao, H., Yan, F., Ji, C

Patil, S. G., Mao, H., Yan, F., Ji, C. C.-J., Suresh, V., Stoica, I., and Gonzalez, J. E. The berkeley function calling leaderboard ( BFCL ): From tool use to agentic evaluation of large language models. In Forty-second International Conference on Machine Learning, 2025

2025

[26] [26]

Synchromesh: Reliable code generation from pre-trained language models

Poesia, G., Polozov, A., Le, V., Tiwari, A., Soares, G., Meek, C., and Gulwani, S. Synchromesh: Reliable code generation from pre-trained language models. In The Eleventh International Conference on Learning Representations, 2022

2022

[27] [27]

Cold decoding: Energy-based constrained text generation with langevin dynamics

Qin, L., Welleck, S., Khashabi, D., and Choi, Y. Cold decoding: Energy-based constrained text generation with langevin dynamics. In Advances in Neural Information Processing Systems, 2022

2022

[28] [28]

and Juang, B

Rabiner, L. and Juang, B. An introduction to hidden markov models. IEEE ASSP Magazine, 3 0 (1), 1986

1986

[29] [29]

D., Ermon, S., and Finn, C

Rafailov, R., Sharma, A., Mitchell, E., Manning, C. D., Ermon, S., and Finn, C. Direct preference optimization: Your language model is secretly a reward model. In Advances in Neural Information Processing Systems, 2023

2023

[30] [30]

PICARD : Parsing incrementally for constrained auto-regressive decoding from language models

Scholak, T., Schucher, N., and Bahdanau, D. PICARD : Parsing incrementally for constrained auto-regressive decoding from language models. In Proceedings of the Conference on Empirical Methods in Natural Language Processing, 2021

2021

[31] [31]

Tractable operations for arithmetic circuits of probabilistic models

Shen, Y., Choi, A., and Darwiche, A. Tractable operations for arithmetic circuits of probabilistic models. Advances in Neural Information Processing Systems, 29, 2016

2016

[32] [32]

A., Pauls, A., Klein, D., Eisner, J., and Van Durme, B

Shin, R., Lin, C., Thomson, S., Chen, C., Roy, S., Platanios, E. A., Pauls, A., Klein, D., Eisner, J., and Van Durme, B. Constrained language models yield few-shot semantic parsers. In Proceedings of the Conference on Empirical Methods in Natural Language Processing, 2021

2021

[33] [33]

Syncode: LLM generation with grammar augmentation

Ugare, S., Suresh, T., Kang, H., Misailovic, S., and Singh, G. Syncode: LLM generation with grammar augmentation. Transactions on Machine Learning Research, 2025

2025

[34] [34]

A compositional atlas of tractable circuit operations for probabilistic inference

Vergari, A., Choi, Y., Liu, A., Teso, S., and Van den Broeck, G. A compositional atlas of tractable circuit operations for probabilistic inference. Advances in Neural Information Processing Systems, 34: 0 13189--13201, 2021

2021

[35] [35]

W., Lester, B., Du, N., Dai, A

Wei, J., Bosma, M., Zhao, V., Guu, K., Yu, A. W., Lester, B., Du, N., Dai, A. M., and Le, Q. V. Finetuned language models are zero-shot learners. In The Eleventh International Conference on Learning Representations, 2022

2022

[36] [36]

Willard, B. T. and Louf, R. Efficient guided generation for large language models. arXiv e-prints, pp.\ arXiv--2307, 2023

2023

[37] [37]

and Klein, D

Yang, K. and Klein, D. Fudge: Controlled text generation with future discriminators. In Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL), 2021

2021

[38] [38]

$\tau$-bench: A Benchmark for Tool-Agent-User Interaction in Real-World Domains

Yao, S., Shinn, N., Razavi, P., and Narasimhan, K. -bench: A benchmark for tool-agent-user interaction in real-world domains. arXiv preprint arXiv:2406.12045, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024

[39] [39]

S pider: A large-scale human-labeled dataset for complex and cross-domain semantic parsing and text-to- SQL task

Yu, T., Zhang, R., Yang, K., Yasunaga, M., Wang, D., Li, Z., Ma, J., Li, I., Yao, Q., Roman, S., Zhang, Z., and Radev, D. S pider: A large-scale human-labeled dataset for complex and cross-domain semantic parsing and text-to- SQL task. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, 2018

2018

[40] [40]

Tractable control for autoregressive language generation

Zhang, H., Dang, M., Peng, N., and Van Den Broeck, G. Tractable control for autoregressive language generation. In The Fortieth International Conference on Machine Learning, 2023

2023

[41] [41]

Adaptable logical control for large language models

Zhang, H., Kung, P.-N., Yoshida, M., Van den Broeck, G., and Peng, N. Adaptable logical control for large language models. In Advances in Neural Information Processing Systems, 2024

2024

[42] [42]

Zhang, H., Dang, M., Wang, B., Ermon, S., Peng, N., and den Broeck, G. V. Scaling probabilistic circuits via monarch matrices. In The Forty-second International Conference on Machine Learning, 2025

2025