LLM-based security code review is vulnerable to framing bias, with a novel iterative refinement attack achieving 100% success in reintroducing vulnerabilities across real projects.
STELLAR: A search- based testing framework for large language model applications
2 Pith papers cite this work. Polarity classification is still indexing.
fields
cs.SE 2years
2026 2representative citing papers
An automated self-testing framework with evidence-based quality gates for LLM application releases was evaluated in a longitudinal case study of a multi-agent conversational AI system, identifying rollback builds and supporting stable quality over four weeks.
citing papers explorer
-
Measuring and Exploiting Contextual Bias in LLM-Assisted Security Code Review
LLM-based security code review is vulnerable to framing bias, with a novel iterative refinement attack achieving 100% success in reintroducing vulnerabilities across real projects.
-
Automated Self-Testing as a Quality Gate: Evidence-Driven Release Management for LLM Applications
An automated self-testing framework with evidence-based quality gates for LLM application releases was evaluated in a longitudinal case study of a multi-agent conversational AI system, identifying rollback builds and supporting stable quality over four weeks.