ProcCtrlBench introduces an ontology of 11 defect types across 4 categories plus control preservation metrics to evaluate LLM coding agent trajectories on 200 cases from AndroidBench, TerminalBench, and SWE-bench-Verified.
ACM Computing Surveys , volume=
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
years
2026 2representative citing papers
Avoiding CenterLoss improves OOD detection via multi-scale Mahalanobis on L2-normalized features, yielding 0.9483 AUROC on CIFAR-10 while preserving competitive in-distribution accuracy.
citing papers explorer
-
ProcCtrlBench: Evaluating Process-Level Defects and Control Preservation in LLM Coding Agents
ProcCtrlBench introduces an ontology of 11 defect types across 4 categories plus control preservation metrics to evaluate LLM coding agent trajectories on 200 cases from AndroidBench, TerminalBench, and SWE-bench-Verified.
-
Don't Collapse Your Features: Why CenterLoss Hurts OOD Detection and Multi-Scale Mahalanobis Wins
Avoiding CenterLoss improves OOD detection via multi-scale Mahalanobis on L2-normalized features, yielding 0.9483 AUROC on CIFAR-10 while preserving competitive in-distribution accuracy.