Copilot arena: A platform for code llm evaluation in the wild.arXiv preprint arXiv:2502.09328

Wayne Chi, Valerie Chen, Anastasios Nikolas Angelopoulos, Wei-Lin Chiang, Aditya Mittal, Naman Jain, Tianjun Zhang, Ion Stoica, Chris Donahue, Ameet Talwalkar · 2025 · arXiv 2502.09328

3 Pith papers cite this work. Polarity classification is still indexing.

3 Pith papers citing it

read on arXiv browse 3 citing papers

representative citing papers

Edit, But Verify: An Empirical Audit of Instructed Code-Editing Benchmarks

cs.SE · 2026-04-06 · conditional · novelty 8.0

The two main benchmarks for LLM instructed code editing over-represent Python, miss common real-world domains and edit types, and have test coverage issues that limit what they measure.

RECAP: An End-to-End Platform for Capturing, Replaying, and Analyzing AI-Assisted Programming Interactions

cs.SE · 2026-05-01 · unverdicted · novelty 6.0

RECAP captures, replays, and analyzes AI-assisted programming sessions by linking prompts, edits, and developer actions in a single timeline.

Mercury: Ultra-Fast Language Models Based on Diffusion

cs.CL · 2025-06-17 · unverdicted · novelty 6.0

Mercury Coder diffusion LLMs achieve throughputs of 1109 and 737 tokens per second on H100 GPUs, up to 10x faster than frontier models with comparable quality.

citing papers explorer

Showing 3 of 3 citing papers.

Edit, But Verify: An Empirical Audit of Instructed Code-Editing Benchmarks cs.SE · 2026-04-06 · conditional · none · ref 6
The two main benchmarks for LLM instructed code editing over-represent Python, miss common real-world domains and edit types, and have test coverage issues that limit what they measure.
RECAP: An End-to-End Platform for Capturing, Replaying, and Analyzing AI-Assisted Programming Interactions cs.SE · 2026-05-01 · unverdicted · none · ref 8
RECAP captures, replays, and analyzes AI-assisted programming sessions by linking prompts, edits, and developer actions in a single timeline.
Mercury: Ultra-Fast Language Models Based on Diffusion cs.CL · 2025-06-17 · unverdicted · none · ref 11
Mercury Coder diffusion LLMs achieve throughputs of 1109 and 737 tokens per second on H100 GPUs, up to 10x faster than frontier models with comparable quality.

Copilot arena: A platform for code llm evaluation in the wild.arXiv preprint arXiv:2502.09328

fields

years

verdicts

representative citing papers

citing papers explorer