Incomplete constrainers in constrained decoding push LLMs into low-probability program regions, making unconstrained decoding outperform constrained decoding on functional correctness across seven models and three benchmarks.
A critique and improvement of the
3 Pith papers cite this work. Polarity classification is still indexing.
years
2026 3verdicts
UNVERDICTED 3representative citing papers
ISU-Test combines rendering-based scene generation with search-based testing to produce up to 10x higher failure rates and 3.6x higher failure coverage in VLMs for in-car scene understanding compared to random generation.
Proposes Prior Random Testing (PRT) that leverages task difficulty to prioritize failure-prone test cases for DRL agents, achieving over 50% lower testing cost than random testing while preserving diversity on four benchmarks.
citing papers explorer
-
The Alignment Problem in Constrained Code Generation
Incomplete constrainers in constrained decoding push LLMs into low-probability program regions, making unconstrained decoding outperform constrained decoding on functional correctness across seven models and three benchmarks.
-
Search-based Testing of Vision Language Models for In-Car Scene Understanding
ISU-Test combines rendering-based scene generation with search-based testing to produce up to 10x higher failure rates and 3.6x higher failure coverage in VLMs for in-car scene understanding compared to random generation.
-
Failure-Based Testing for Deep Reinforcement Learning Agents
Proposes Prior Random Testing (PRT) that leverages task difficulty to prioritize failure-prone test cases for DRL agents, achieving over 50% lower testing cost than random testing while preserving diversity on four benchmarks.