Barr, Mark Harman, Phil McMinn, Muzammil Shahbaz, and Shin Yoo

Earl T

4 Pith papers cite this work. Polarity classification is still indexing.

4 Pith papers citing it

browse 4 citing papers

citation-role summary

method 1

citation-polarity summary

use method 1

representative citing papers

RESTestBench: A Benchmark for Evaluating the Effectiveness of LLM-Generated REST API Test Cases from NL Requirements

cs.SE · 2026-04-28 · unverdicted · novelty 7.0

RESTestBench shows that LLM-generated REST API test effectiveness drops when interacting with faulty or mutated code, especially for vague requirements, indicating that high-detail requirements make direct SUT interaction unnecessary.

VISOR: A Vision-Language Model-based Test Oracle for Testing Robots

cs.SE · 2026-05-11 · unverdicted · novelty 6.0 · 2 refs

VISOR is a VLM-based automated test oracle that evaluates robot task correctness and quality from videos while reporting its own uncertainty, tested on GPT and Gemini across four tasks and over 1000 videos with Gemini showing higher recall and GPT higher precision but low uncertainty-correctness tie

Treating Run-time Execution History as a First-Class Citizen: Co-Versioning Run-time Behavior alongside Code

cs.SE · 2026-04-18 · conditional · novelty 6.0

Behavioral Co-Versioning couples Git history with a queryable Behavioral Archive of run-time observations to enable semantic diffing and behavior-aware analysis of software evolution.

Towards Reliable Testing of Machine Unlearning

cs.LG · 2026-04-16 · unverdicted · novelty 6.0

Causal fuzzing with budgeted interventions can detect residual direct and indirect influence of unlearned data that standard attribution methods miss due to proxies, cancellations, and masking.

citing papers explorer

Showing 4 of 4 citing papers.

RESTestBench: A Benchmark for Evaluating the Effectiveness of LLM-Generated REST API Test Cases from NL Requirements cs.SE · 2026-04-28 · unverdicted · none · ref 10
RESTestBench shows that LLM-generated REST API test effectiveness drops when interacting with faulty or mutated code, especially for vague requirements, indicating that high-detail requirements make direct SUT interaction unnecessary.
VISOR: A Vision-Language Model-based Test Oracle for Testing Robots cs.SE · 2026-05-11 · unverdicted · none · ref 7 · 2 links
VISOR is a VLM-based automated test oracle that evaluates robot task correctness and quality from videos while reporting its own uncertainty, tested on GPT and Gemini across four tasks and over 1000 videos with Gemini showing higher recall and GPT higher precision but low uncertainty-correctness tie
Treating Run-time Execution History as a First-Class Citizen: Co-Versioning Run-time Behavior alongside Code cs.SE · 2026-04-18 · conditional · none · ref 9
Behavioral Co-Versioning couples Git history with a queryable Behavioral Archive of run-time observations to enable semantic diffing and behavior-aware analysis of software evolution.
Towards Reliable Testing of Machine Unlearning cs.LG · 2026-04-16 · unverdicted · none · ref 3
Causal fuzzing with budgeted interventions can detect residual direct and indirect influence of unlearned data that standard attribution methods miss due to proxies, cancellations, and masking.

Barr, Mark Harman, Phil McMinn, Muzammil Shahbaz, and Shin Yoo

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer