pith. sign in

arxiv: 2604.00491 · v2 · pith:OQSXWL2Nnew · submitted 2026-04-01 · 💻 cs.PL · cs.AI· cs.SE

Executing as You Generate: Hiding Execution Latency in LLM Code Interpreters

classification 💻 cs.PL cs.AIcs.SE
keywords executioncodelatencygenerationduringeagerend-to-endexecutes
0
0 comments X
read the original abstract

Current LLM systems are increasingly equipped with a code interpreter that executes generated code to obtain results. This works serially: the model first generates the complete code, then an interpreter executes it. This sequential workflow leaves the executor idle during generation and the generator idle during execution, resulting in unnecessary end-to-end latency. Our key observation is that an LLM, unlike a human developer, emits code tokens left to right and does not backtrack over what it has already written. This makes it possible to start executing a piece of code while later tokens are still being generated. We formalize this parallel execution paradigm, modeling it as a three-stage pipeline of generation, detection, and execution, and derive closed-form latency bounds that characterize its speedup potential and operating regimes. We then present EAGER, a concrete implementation featuring AST-based chunking, dynamic batching with gated execution, and early error interruption. We evaluate EAGER across four benchmarks, seven LLMs, and three execution environments. The overlap mechanism hides almost all execution behind generation, reducing the non-overlapped portion of execution time by up to 99.8% and cutting end-to-end latency by up to 37.3% on error-free runs.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. Are Performance-Optimization Benchmarks Reliably Measuring Coding Agents?

    cs.SE 2026-07 unverdicted novelty 6.0

    Audit of GSO, SWE-Perf and SWE-fficiency reveals that reference patches satisfy validity rules across machines for only 39/102, 11/140 and 411/498 tasks respectively, public submissions beat references on 85.3% of rep...