14th Intl

· 2009 · arXiv 8244.150827

3 Pith papers cite this work. Polarity classification is still indexing.

3 Pith papers citing it

representative citing papers

Are Performance-Optimization Benchmarks Reliably Measuring Coding Agents?

cs.SE · 2026-07-01 · unverdicted · novelty 6.0

Audit of GSO, SWE-Perf and SWE-fficiency reveals that reference patches satisfy validity rules across machines for only 39/102, 11/140 and 411/498 tasks respectively, public submissions beat references on 85.3% of replay-valid tasks, and scoring rules cause ranking disagreements.

Exploring Statistical Change Point Detection Techniques for Performance Anomaly Detection at Mozilla

cs.SE · 2026-06-16 · unverdicted · novelty 5.0

Ensemble voting strategies for change point detection improve F1-score by 11% over Mozilla's T-test method on a new ground-truth dataset of 174 performance time series annotated by practitioners.

Misleading Microbenchmarks on the Java Virtual Machines

cs.PL · 2026-05-22 · unverdicted · novelty 4.0

Microbenchmarks on the JVM can produce misleading results due to unrealistic profiles collected during isolated execution despite following JMH guidelines.

citing papers explorer

Showing 3 of 3 citing papers after filters.

Are Performance-Optimization Benchmarks Reliably Measuring Coding Agents? cs.SE · 2026-07-01 · unverdicted · none · ref 15
Audit of GSO, SWE-Perf and SWE-fficiency reveals that reference patches satisfy validity rules across machines for only 39/102, 11/140 and 411/498 tasks respectively, public submissions beat references on 85.3% of replay-valid tasks, and scoring rules cause ranking disagreements.
Exploring Statistical Change Point Detection Techniques for Performance Anomaly Detection at Mozilla cs.SE · 2026-06-16 · unverdicted · none · ref 66
Ensemble voting strategies for change point detection improve F1-score by 11% over Mozilla's T-test method on a new ground-truth dataset of 174 performance time series annotated by practitioners.
Misleading Microbenchmarks on the Java Virtual Machines cs.PL · 2026-05-22 · unverdicted · none · ref 29
Microbenchmarks on the JVM can produce misleading results due to unrealistic profiles collected during isolated execution despite following JMH guidelines.

14th Intl

fields

years

verdicts

representative citing papers

citing papers explorer