LibEvolutionEval: A benchmark and study for version-specific code generation

Sachit Kuhar, Wasi Uddin Ahmad, Zijian Wang, Nihal Jain, Haifeng Qian, Baishakhi Ray, Murali Krishna Ramanathan, Xiaofei Ma, Anoop Deoras · 2025 · arXiv 2412.04478

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it

read on arXiv browse 2 citing papers

representative citing papers

LibEvoBench: Probing Temporal Knowledge Stratification in Code Generation Models

cs.SE · 2026-06-24 · unverdicted · novelty 7.0

LibEvoBench benchmark shows LLMs are version-oblivious on evolving APIs, with documentation helping but version specification not.

Knowledge Boundary Probing and Demand-Guided Intervention for LLM-Based Power System Code Generation

cs.SE · 2026-05-29 · unverdicted · novelty 7.0

PowerCodeBench and a boundary-aware intervention raise LLM accuracy on power-system code generation by 32-56 points across ten open-weight models and four commercial APIs on a 2,000-task benchmark.

citing papers explorer

Showing 2 of 2 citing papers.

LibEvoBench: Probing Temporal Knowledge Stratification in Code Generation Models cs.SE · 2026-06-24 · unverdicted · none · ref 61
LibEvoBench benchmark shows LLMs are version-oblivious on evolving APIs, with documentation helping but version specification not.
Knowledge Boundary Probing and Demand-Guided Intervention for LLM-Based Power System Code Generation cs.SE · 2026-05-29 · unverdicted · none · ref 10
PowerCodeBench and a boundary-aware intervention raise LLM accuracy on power-system code generation by 32-56 points across ten open-weight models and four commercial APIs on a 2,000-task benchmark.

LibEvolutionEval: A benchmark and study for version-specific code generation

fields

years

verdicts

representative citing papers

citing papers explorer