In Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (V olume 1: Long Papers), pages 14691–14714, Bangkok, Thailand

Mission: Impossible Language Models

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

browse 1 citing papers

representative citing papers

cs.CL · 2025-06-27 · unverdicted · novelty 6.0

RExBench is a new benchmark showing that LLM coding agents fail to autonomously implement most realistic research extensions to prior AI papers.

Showing 1 of 1 citing paper.

RExBench: Can coding agents autonomously implement AI research extensions? cs.CL · 2025-06-27 · unverdicted · none · ref 22
RExBench is a new benchmark showing that LLM coding agents fail to autonomously implement most realistic research extensions to prior AI papers.