STELLAR: A search- based testing framework for large language model applications

Duygu Cetinkaya et al · 2026 · arXiv 2601.00497

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it

representative citing papers

Measuring and Exploiting Contextual Bias in LLM-Assisted Security Code Review

cs.SE · 2026-03-19 · accept · novelty 7.0

LLM-based security code review is vulnerable to framing bias, with a novel iterative refinement attack achieving 100% success in reintroducing vulnerabilities across real projects.

Automated Self-Testing as a Quality Gate: Evidence-Driven Release Management for LLM Applications

cs.SE · 2026-03-13 · unverdicted · novelty 5.0

An automated self-testing framework with evidence-based quality gates for LLM application releases was evaluated in a longitudinal case study of a multi-agent conversational AI system, identifying rollback builds and supporting stable quality over four weeks.

citing papers explorer

Showing 2 of 2 citing papers.

Measuring and Exploiting Contextual Bias in LLM-Assisted Security Code Review cs.SE · 2026-03-19 · accept · none · ref 67
LLM-based security code review is vulnerable to framing bias, with a novel iterative refinement attack achieving 100% success in reintroducing vulnerabilities across real projects.
Automated Self-Testing as a Quality Gate: Evidence-Driven Release Management for LLM Applications cs.SE · 2026-03-13 · unverdicted · none · ref 5
An automated self-testing framework with evidence-based quality gates for LLM application releases was evaluated in a longitudinal case study of a multi-agent conversational AI system, identifying rollback builds and supporting stable quality over four weeks.

STELLAR: A search- based testing framework for large language model applications

fields

years

verdicts

representative citing papers

citing papers explorer