A critique and improvement of the

András Vargha, Harold D · 2000 · arXiv stable/1165329

3 Pith papers cite this work. Polarity classification is still indexing.

3 Pith papers citing it

representative citing papers

The Alignment Problem in Constrained Code Generation

cs.SE · 2026-06-19 · unverdicted · novelty 7.0

Incomplete constrainers in constrained decoding push LLMs into low-probability program regions, making unconstrained decoding outperform constrained decoding on functional correctness across seven models and three benchmarks.

Search-based Testing of Vision Language Models for In-Car Scene Understanding

cs.CV · 2026-07-02 · unverdicted · novelty 6.0

ISU-Test combines rendering-based scene generation with search-based testing to produce up to 10x higher failure rates and 3.6x higher failure coverage in VLMs for in-car scene understanding compared to random generation.

Failure-Based Testing for Deep Reinforcement Learning Agents

cs.SE · 2026-06-30 · unverdicted · novelty 6.0

Proposes Prior Random Testing (PRT) that leverages task difficulty to prioritize failure-prone test cases for DRL agents, achieving over 50% lower testing cost than random testing while preserving diversity on four benchmarks.

citing papers explorer

Showing 3 of 3 citing papers after filters.

The Alignment Problem in Constrained Code Generation cs.SE · 2026-06-19 · unverdicted · none · ref 47
Incomplete constrainers in constrained decoding push LLMs into low-probability program regions, making unconstrained decoding outperform constrained decoding on functional correctness across seven models and three benchmarks.
Search-based Testing of Vision Language Models for In-Car Scene Understanding cs.CV · 2026-07-02 · unverdicted · none · ref 26
ISU-Test combines rendering-based scene generation with search-based testing to produce up to 10x higher failure rates and 3.6x higher failure coverage in VLMs for in-car scene understanding compared to random generation.
Failure-Based Testing for Deep Reinforcement Learning Agents cs.SE · 2026-06-30 · unverdicted · none · ref 39
Proposes Prior Random Testing (PRT) that leverages task difficulty to prioritize failure-prone test cases for DRL agents, achieving over 50% lower testing cost than random testing while preserving diversity on four benchmarks.

A critique and improvement of the

fields

years

verdicts

representative citing papers

citing papers explorer