arxiv: 2605.10065 · v1 · submitted 2026-05-11 · 💻 cs.CL · cs.AI

Recognition: no theorem link

NCO: A Versatile Plug-in for Handling Negative Constraints in Decoding

Hyundong Jin, Yo-Sub Han

Authors on Pith no claims yet

Pith reviewed 2026-05-12 05:04 UTC · model grok-4.3

classification 💻 cs.CL cs.AI

keywords constrained decodingnegative constraintsLLM safetyonline pattern matchingprofanity suppressionPII preventionregex constraintsdecoding strategies

0 comments

The pith

NCO prevents multiple forbidden patterns in LLM generation by performing online pattern matching on individual constraints instead of building one large automaton.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents NCO as a method to enforce negative constraints during the decoding phase of large language models. Negative constraints include forbidden strings or regular expressions that should not appear in the output, such as profanity or personal information. Previous approaches either use post-processing or create a combined automaton for all constraints, but the latter leads to an explosion in the number of states as constraints increase. NCO instead uses online pattern matching to check constraints in real time as text is generated. This approach maintains compatibility with common decoding strategies like sampling and beam search while adding support for soft constraints. If successful, it would allow safer and more controlled text generation with lower computational cost.

Core claim

NCO is a versatile plug-in for constrained decoding that handles multiple hard negative constraints and regex constraints through online pattern matching, thereby avoiding the state explosion associated with constructing a single automaton that tracks all forbidden patterns simultaneously.

What carries the argument

Online pattern matching applied separately to each finite hard constraint and regex constraint during the token-by-token generation process.

If this is right

Multiple constraints can be enforced without the state space growing exponentially with the number of patterns.
NCO integrates directly with standard inference methods including various sampling techniques and beam search.
Soft masking is supported to allow probabilistic suppression of forbidden content rather than hard blocking.
Empirical results show effectiveness in practical applications such as suppressing PII and profanity in generated text.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Extending NCO to handle context-dependent constraints might require additional state but could still be more efficient than full automata.
The method could be combined with other alignment techniques to provide layered safety in LLMs.
Testing on longer generations or more numerous constraints would reveal if the online matching scales better in practice than claimed.
Potential applications in real-time chat systems where both safety and speed are priorities.

Load-bearing premise

That online pattern matching can detect every instance of a forbidden pattern in the generated sequence without missing occurrences or introducing overhead that cancels out the efficiency gains.

What would settle it

A generated output that contains a forbidden pattern despite NCO being active, or a performance benchmark showing higher total computation time than the combined automaton method for a moderate number of constraints.

Figures

Figures reproduced from arXiv: 2605.10065 by Hyundong Jin, Yo-Sub Han.

**Figure 2.** Figure 2: Overall architecture of NCO. Given forbidden strings and regex patterns, we build the [PITH_FULL_IMAGE:figures/full_fig_p005_2.png] view at source ↗

**Figure 3.** Figure 3: Sensitivity analysis of throughput under increasing constraint scale. [PITH_FULL_IMAGE:figures/full_fig_p028_3.png] view at source ↗

read the original abstract

Controlling Large Language Models (LLMs) to prevent the generation of undesirable content, such as profanity and personally identifiable information (PII), has become increasingly critical. While earlier approaches relied on post-processing or resampling, recent research has shifted towards constrained decoding methods that control outputs during generation to mitigate high computational costs and quality degradation. However, preventing multiple forbidden hard constraints or regex constraints from appearing anywhere in the output is computationally challenging. A straightforward solution is to convert these constraints into a single automaton that tracks all forbidden patterns during decoding, but this often becomes impractically large. Standard regex engines also do not readily support the operations needed to build such a constraint, such as complement and intersection. In order to address these limitations, we propose NCO, a decoding strategy that performs online pattern matching over finite hard constraints and regex constraints, reducing computational overhead without inducing state explosion. NCO is fully compatible with standard inference strategies, including various sampling methods and beam search, while also supporting soft masking for probabilistic suppression. We empirically demonstrate its effectiveness across practical tasks, including PII and profanity suppression. Our implementation is available at https://github.com/hyundong98/NCO-Decoding.git .

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

NCO's online pattern matching for negative constraints is a practical idea worth checking, but the abstract leaves the efficiency claims and experiments too thin to judge yet.

read the letter

The core of this paper is NCO, an online pattern matching method that enforces multiple negative constraints (hard strings or regexes like PII or profanity) during LLM decoding without building one giant automaton that tracks everything. It aims to keep overhead low while staying compatible with sampling, beam search, and soft masking for probabilistic suppression. They release code on GitHub, which helps anyone who wants to test it directly.

Referee Report

2 major / 2 minor

Summary. The manuscript proposes NCO, a plug-in decoding strategy for LLMs that enforces negative constraints (finite hard constraints and regex patterns for undesirable content such as PII and profanity) via online pattern matching. It claims this avoids the state explosion of building a single automaton for all constraints, remains compatible with sampling, beam search, and soft masking, and demonstrates empirical effectiveness on PII and profanity suppression tasks.

Significance. If the online matching approach indeed scales without hidden exponential costs or missed patterns, NCO could offer a practical, low-overhead alternative to automaton-based or post-hoc methods for negative constraint enforcement in LLM generation, addressing a common safety need.

major comments (2)

[Method / Algorithm description] The abstract and method description assert that online pattern matching over regex constraints avoids state explosion, but standard NFA simulation for multiple overlapping patterns can still incur up to exponential state growth in the worst case; the manuscript must explicitly detail the data structures, pruning, or optimizations used to bound this (e.g., in the algorithm pseudocode or complexity analysis section) to support the central efficiency claim.
[Experiments] Empirical evaluation on PII and profanity tasks claims effectiveness but provides insufficient detail on baselines, exact metrics (e.g., constraint violation rate, perplexity impact), failure cases, or number of constraints tested; without these, it is difficult to verify that the claimed overhead reduction holds in practice.

minor comments (2)

[Abstract] The abstract mentions compatibility with 'various sampling methods' but does not clarify whether NCO integrates with top-k, nucleus, or temperature sampling without altering their probability distributions.
[Abstract] The GitHub link is provided, but the manuscript should include a brief reproducibility note on how to reproduce the PII/profane suppression experiments.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive feedback on our manuscript. We address each major comment below and have revised the paper to incorporate additional details on the algorithmic approach and experimental setup.

read point-by-point responses

Referee: [Method / Algorithm description] The abstract and method description assert that online pattern matching over regex constraints avoids state explosion, but standard NFA simulation for multiple overlapping patterns can still incur up to exponential state growth in the worst case; the manuscript must explicitly detail the data structures, pruning, or optimizations used to bound this (e.g., in the algorithm pseudocode or complexity analysis section) to support the central efficiency claim.

Authors: We thank the referee for this observation. NCO performs independent online pattern matching for each constraint separately using dedicated state machines (one per hard constraint or regex), rather than building or simulating a single combined automaton. This avoids the exponential costs of intersections or unions over multiple patterns. The total state is the sum of per-constraint states, which remains linear in the number of constraints. We have added explicit pseudocode (revised Algorithm 1) and a complexity analysis subsection in Section 3 that details the data structures (per-matcher state sets with no cross-constraint merging) and proves the bound. These changes directly support the efficiency claim. revision: yes
Referee: [Experiments] Empirical evaluation on PII and profanity tasks claims effectiveness but provides insufficient detail on baselines, exact metrics (e.g., constraint violation rate, perplexity impact), failure cases, or number of constraints tested; without these, it is difficult to verify that the claimed overhead reduction holds in practice.

Authors: We agree that greater experimental detail is needed for full verification. The revised manuscript expands the experiments section to specify all baselines (including post-processing and resampling methods), exact metrics (constraint violation rate, perplexity delta, generation latency), the precise number of constraints tested per task, and a dedicated subsection analyzing failure cases. These additions confirm the reported overhead reductions while maintaining compatibility with sampling and beam search. revision: yes

Circularity Check

0 steps flagged

No circularity: algorithmic proposal is self-contained

full rationale

The paper presents NCO as a new decoding strategy that performs online pattern matching over finite hard constraints and regex constraints to avoid state explosion in constrained generation. No equations, derivations, or fitted parameters are described that reduce to the inputs by construction. The core claim rests on an independent algorithmic design, compatibility with standard inference (sampling, beam search), and empirical validation on PII/profanity tasks rather than any self-citation chain or self-definitional loop. This is the expected non-finding for an applied algorithmic contribution.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The paper relies on standard concepts from automata theory and pattern matching without introducing new free parameters, axioms beyond domain standards, or invented entities.

axioms (1)

standard math Finite automata and regex can represent forbidden patterns for string matching
Invoked implicitly when discussing constraint representation and state explosion.

pith-pipeline@v0.9.0 · 5501 in / 1069 out tokens · 43296 ms · 2026-05-12T05:04:32.650357+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

31 extracted references · 31 canonical work pages · 3 internal anchors

[1]

Hewett, Mojan Javaheripi, Piero Kauffmann, et al

Marah Abdin, Jyoti Aneja, Harkirat Behl, Sébastien Bubeck, Ronen Eldan, Suriya Gunasekar, Michael Harrison, Russell J. Hewett, Mojan Javaheripi, Piero Kauffmann, et al. Phi-4 technical report, 2024

work page 2024
[2]

Efficient string matching: an aid to bibliographic search

Alfred V Aho and Margaret J Corasick. Efficient string matching: an aid to bibliographic search. Communications of the ACM, 18(6):333–340, 1975

work page 1975
[3]

The Falcon series of open language models, 2023

Ebtesam Almazrouei, Hamza Alobeidli, Abdulaziz Alshamsi, Alessandro Cappelli, Ruxan- dra Cojocaru, Merouane Debbah, Etienne Goffinet, Daniel Hesslow, Julien Launay, Quentin Malartic, et al. The Falcon series of open language models, 2023

work page 2023
[4]

Guiding llms the right way: Fast, non- invasive constrained generation

Luca Beurer-Kellner, Marc Fischer, and Martin Vechev. Guiding llms the right way: Fast, non- invasive constrained generation. InForty-first International Conference on Machine Learning, 2024

work page 2024
[5]

Pythia: A suite for analyzing large language models across training and scaling

Stella Biderman, Hailey Schoelkopf, Quentin Anthony, Herbie Bradley, Kyle O’Brien, Eric Hallahan, Mohammad Aflah Khan, Shivanshu Purohit, USVSN Sai Prashanth, Edward Raff, Aviya Skowron, Lintang Sutawika, and Oskar van der Wal. Pythia: A suite for analyzing large language models across training and scaling. InProceedings of the 40th International Conferen...

work page 2023
[6]

GPT-Neo: Large scale autoregressive language modeling, 2021

Sid Black, Leo Gao, Phil Wang, Connor Leahy, and Stella Biderman. GPT-Neo: Large scale autoregressive language modeling, 2021

work page 2021
[7]

Tokenization as finite-state transduction.Computational Linguistics, pages 1–35, 2025

Marco Cognetta and Naoaki Okazaki. Tokenization as finite-state transduction.Computational Linguistics, pages 1–35, 2025

work page 2025
[8]

Guard: Generation-time llm unlearning via adaptive restriction and detection

Zhijie Deng, Chris Yuhao Liu, Zirui Pang, Xinlei He, Lei Feng, Qi Xuan, Zhaowei Zhu, and Jiaheng Wei. Guard: Generation-time llm unlearning via adaptive restriction and detection. arXiv preprint arXiv:2505.13312, 2025

work page arXiv 2025
[9]

A general-purpose algorithm for con- strained sequential inference

Daniel Deutsch, Shyam Upadhyay, and Dan Roth. A general-purpose algorithm for con- strained sequential inference. InProceedings of the 23rd Conference on Computational Natural Language Learning (CoNLL), pages 482–492, 2019

work page 2019
[10]

XGram- mar: Efficient Structured Generation via Grammar-Constrained Decoding,

Yixin Dong, Charlie F. Ruan, Yaxing Cai, Ruihang Lai, Ziyi Xu, Yilong Zhao, and Tianqi Chen. Xgrammar: Flexible and efficient structured generation engine for large language models.arXiv preprint arXiv.2411.15100, 2024

work page arXiv 2024
[11]

The Pile: An 800GB Dataset of Diverse Text for Language Modeling

Leo Gao, Stella Biderman, Sid Black, Laurence Golding, Travis Hoppe, Charles Foster, Jason Phang, Horace He, Anish Thite, Noa Nabeshima, Shawn Presser, and Connor Leahy. The pile: An 800gb dataset of diverse text for language modeling.arXiv preprint arXiv:2101.00027, 2020

work page internal anchor Pith review Pith/arXiv arXiv 2020
[12]

Samuel Gehman, Suchin Gururangan, Maarten Sap, Yejin Choi, and Noah A. Smith. Real- ToxicityPrompts: Evaluating neural toxic degeneration in language models. InFindings of the Association for Computational Linguistics: EMNLP 2020, pages 3356–3369, 2020

work page 2020
[13]

Grammar-constrained decoding for structured nlp tasks without finetuning

Saibo Geng, Martin Josifoski, Maxime Peyrard, and Robert West. Grammar-constrained decoding for structured nlp tasks without finetuning. InProceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, 2023

work page 2023
[14]

The Llama 3 herd of models, 2024

Aaron Grattafiori, Abhimanyu Dubey, Abhinav Jauhri, Abhinav Pandey, Abhishek Kadian, Ahmad Al-Dahle, Aiesha Letman, Akhil Mathur, Alan Schelten, Alex Vaughan, et al. The Llama 3 herd of models, 2024

work page 2024
[15]

Regexpspace: A benchmark for evaluating llm reasoning on pspace-complete regex problems, 2025

Hyundong Jin, Joonghyuk Hahn, and Yo-Sub Han. Regexpspace: A benchmark for evaluating llm reasoning on pspace-complete regex problems, 2025. URL https://arxiv.org/abs/ 2510.09227. 10

work page arXiv 2025
[16]

The enron corpus: A new dataset for email classification research

Bryan Klimt and Yiming Yang. The enron corpus: A new dataset for email classification research. InEuropean conference on machine learning, pages 217–226, 2004

work page 2004
[17]

Automata-based constraints for language model decoding

Terry Koo, Frederick Liu, and Luheng He. Automata-based constraints for language model decoding. InFirst Conference on Language Modeling, 2024

work page 2024
[18]

Neurologic decoding:(un) supervised neural text generation with predicate logic constraints

Ximing Lu, Peter West, Rowan Zellers, Ronan Le Bras, Chandra Bhagavatula, and Yejin Choi. Neurologic decoding:(un) supervised neural text generation with predicate logic constraints. InProceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human language technologies, pages 4288–4299, 2021

work page 2021
[19]

Smith, and Yejin Choi

Ximing Lu, Sean Welleck, Peter West, Liwei Jiang, Jungo Kasai, Daniel Khashabi, Ronan Le Bras, Lianhui Qin, Youngjae Yu, Rowan Zellers, Noah A. Smith, and Yejin Choi. Neurologic a* esque decoding: Constrained text generation with lookahead heuristics. InProceedings of the 2022 Conference of the North American Chapter of the Association for Computational L...

work page 2022
[20]

Flexible and efficient grammar-constrained decoding.arXiv preprint arXiv:2502.05111, 2025

Kanghee Park, Timothy Zhou, and Loris D’Antoni. Flexible and efficient grammar-constrained decoding.arXiv preprint arXiv:2502.05111, 2025

work page arXiv 2025
[21]

Synchromesh: Reliable code generation from pre-trained language models

Gabriel Poesia, Alex Polozov, Vu Le, Ashish Tiwari, Gustavo Soares, Christopher Meek, and Sumit Gulwani. Synchromesh: Reliable code generation from pre-trained language models. In The Tenth International Conference on Learning Representations, 2022, Virtual Event, April 25-29, 2022, 2022

work page 2022
[22]

Picard: Parsing incrementally for constrained auto-regressive decoding from language models

Torsten Scholak, Nathan Schucher, and Dzmitry Bahdanau. Picard: Parsing incrementally for constrained auto-regressive decoding from language models. InProceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pages 9895–9901, 2021

work page 2021
[23]

List of dirty, naughty, obscene, and otherwise bad words

Shutterstock. List of dirty, naughty, obscene, and otherwise bad words. https://github. com/LDNOOBW/List-of-Dirty-Naughty-Obscene-and-Otherwise-Bad-Words, 2012

work page 2012
[24]

Introduction to the theory of computation.ACM Sigact News, 27(1):27–29, 1996

Michael Sipser. Introduction to the theory of computation.ACM Sigact News, 27(1):27–29, 1996

work page 1996
[25]

Hashimoto

Rohan Taori, Ishaan Gulrajani, Tianyi Zhang, Yann Dubois, Xuechen Li, Carlos Guestrin, Percy Liang, and Tatsunori B. Hashimoto. Stanford alpaca: An instruction-following llama model. https://github.com/tatsu-lab/stanford_alpaca, 2023

work page 2023
[26]

Welcome to the Falcon 3 family of open models! https: //huggingface.co/blog/falcon3, 2024

Technology Innovation Institute. Welcome to the Falcon 3 family of open models! https: //huggingface.co/blog/falcon3, 2024

work page 2024
[27]

Llama 2: Open Foundation and Fine-Tuned Chat Models

Hugo Touvron, Louis Martin, Kevin Stone, Peter Albert, Amjad Almahairi, Yasmine Babaei, Nikolay Bashlykov, Soumya Batra, Prajjwal Bhargava, Shruti Bhosale, et al. Llama 2: Open foundation and fine-tuned chat models, 2023. URL https://arxiv.org/abs/2307.09288

work page internal anchor Pith review Pith/arXiv arXiv 2023
[28]

GPT-J-6B: A 6 billion parameter autoregressive language model.https://github.com/kingoflolz/mesh-transformer-jax, May 2021

Ben Wang and Aran Komatsuzaki. GPT-J-6B: A 6 billion parameter autoregressive language model.https://github.com/kingoflolz/mesh-transformer-jax, May 2021

work page 2021
[29]

Efficient Guided Generation for Large Language Models

Brandon T Willard and Rémi Louf. Efficient guided generation for large language models. arXiv preprint arXiv:2307.09702, 2023

work page internal anchor Pith review arXiv 2023
[30]

Qwen2.5 technical report, 2024

An Yang, Baosong Yang, Beichen Zhang, Binyuan Hui, Bo Zheng, Bowen Yu, Chengyuan Li, Dayiheng Liu, Fei Huang, Haoran Wei, et al. Qwen2.5 technical report, 2024

work page 2024
[31]

Abstract reasoning via logic-guided generation.arXiv preprint arXiv:2107.10493, 2021

Sihyun Yu, Sangwoo Mo, Sungsoo Ahn, and Jinwoo Shin. Abstract reasoning via logic-guided generation.arXiv preprint arXiv:2107.10493, 2021. A Preliminaries We briefly review the background relevant to our method, including regular languages, BPE tok- enization, the closure properties of regular languages under intersection and complement, and the Aho-Coras...

work page arXiv 2021