Adding product context retrieval to AI coding agents raises decision compliance from 46% to 95% on a new benchmark of 8 tasks with 41 weighted decision points.
hub
The Impact of AI on Developer Productivity: Evidence from GitHub Copilot
33 Pith papers cite this work. Polarity classification is still indexing.
abstract
Generative AI tools hold promise to increase human productivity. This paper presents results from a controlled experiment with GitHub Copilot, an AI pair programmer. Recruited software developers were asked to implement an HTTP server in JavaScript as quickly as possible. The treatment group, with access to the AI pair programmer, completed the task 55.8% faster than the control group. Observed heterogenous effects show promise for AI pair programmers to help people transition into software development careers.
hub tools
claims ledger
- abstract Generative AI tools hold promise to increase human productivity. This paper presents results from a controlled experiment with GitHub Copilot, an AI pair programmer. Recruited software developers were asked to implement an HTTP server in JavaScript as quickly as possible. The treatment group, with access to the AI pair programmer, completed the task 55.8% faster than the control group. Observed heterogenous effects show promise for AI pair programmers to help people transition into software development careers.
co-cited works
representative citing papers
A network analysis of software mentions in 1.3 million papers identifies 520 tools in eight communities and shows disciplines maintain distinct, stable tool portfolios that are crystallizing toward common sets.
The Mise en Place methodology uses contextual grounding, collaborative specification, and task decomposition to prepare AI agents for coding tasks, demonstrated in a hackathon where two hours of prep enabled rapid parallel development of a full-stack platform.
Generative AI boosted solo entrepreneurial entry on Product Hunt after ChatGPT but teams still dominate the top quality tiers.
SynConfRoute routes code completions using syntax validation and token confidence, improving pass@1 by up to 31% on hard tasks and reducing accelerator usage by 58% versus always using the largest model.
Freelancers use generative AI to support exploratory skill acquisition but not as their main resource due to reliability issues, leading to a shift toward survival-oriented upskilling and the emergence of invisible competencies that lack market validation.
SpecValidator detects lexical vagueness, under-specification, and syntax-formatting defects in LLM code-generation prompts with F1 0.804, outperforming GPT-5-mini and Claude Sonnet 4, and shows that under-specification is the most damaging defect type while richer benchmarks are more resilient.
A game-theoretic model shows that individually rational adoption of generative AI causes model collapse that reduces collective social welfare for important tasks, with habit formation creating spillovers from low-stakes to high-value domains.
BONSAI introduces a four-layer architecture and four-phase workflow for human-AI co-development of visual analytics applications, shown in case studies to enable efficient novel tool creation and reconstruction from paper descriptions.
Co-locating tests with implementation code yields substantially higher preservation and correctness in foundation-model-generated programs than separated test syntax.
LLMs produce executable code only 42.55% of the time under API evolution without full documentation, improving to 66.36% with structured docs and by 11% more with reasoning strategies, yet outdated patterns persist.
REAgent improves LLM patch generation for software issues by 17.4% on average through automated construction, quality checking, and iterative refinement of structured issue-oriented requirements.
StarCoder2-15B matches or beats CodeLlama-34B on code tasks despite being smaller, and StarCoder2-3B outperforms prior 15B models, with open weights and exact training data identifiers released.
Reverie is a new AI-powered game that reduced stress levels in a pilot study of 20 students while providing excellent user experience and improved cognitive emotion regulation.
Meta-analysis of 23 studies shows moderate productivity gains from GenAI coding assistants (Hedges' g=0.33) but no significant effect on learning (g=0.14).
The Productivity-Reliability Paradox arises because AI code generators produce variable output while developers lack sufficient specification discipline, making governance models focused on specifications the binding constraint rather than model improvements.
Agentic AI systems are shifting software engineering from line-level code generation to delegated repository-scale execution under supervision, with SWE-bench performance rising from 1.96% to 78.4% and productivity gains of 13.6-55.8%.
Among novice programmers using AI code generators, trust did not predict compliance with suggestions, while performance correlated with both compliance and increased subsequent trust.
AI-native software ecosystems exhibit emergent behaviors best explained by complex adaptive systems theory, requiring new ecosystem-level monitoring and seven testable propositions that may extend or replace Lehman's laws.
Agentic Consensus replaces code as the main artifact with a typed property graph world model that maintains commitments and evidence through synchronization operators, shifting evaluation to alignment fidelity and consensus entropy.
Sema Code decouples AI coding agents into a programmable npm library with eight mechanisms for isolation, queuing, compression, scheduling, permissions, and integration.
The AI Codebase Maturity Model defines six sequential levels of AI-driven development based on feedback loop topologies, validated by experience reports showing 5x PR and 37x issue throughput gains from level 2 to level 6.
Collaborative ML reproducibility requires socio-technical interactional support beyond artifacts, demonstrated via a clinical deployment and addressed by a proposed two-layer system with an AI semantic interface.
EcoAssist embeds energy estimation and optimization into AI-assisted frontend coding, reducing website energy use by 13-16% in benchmarks while preserving developer productivity.
citing papers explorer
-
Context-Augmented Code Generation: How Product Context Improves AI Coding Agent Decision Compliance by 49%
Adding product context retrieval to AI coding agents raises decision compliance from 46% to 95% on a new benchmark of 8 tasks with 41 weighted decision points.
-
Mise en Place for Agentic Coding: Deliberate Preparation as Context Engineering Methodology
The Mise en Place methodology uses contextual grounding, collaborative specification, and task decomposition to prepare AI agents for coding tasks, demonstrated in a hackathon where two hours of prep enabled rapid parallel development of a full-stack platform.
-
SynConfRoute: Syntax-Aware Routing for Efficient Code Completion with Small CodeLLMs
SynConfRoute routes code completions using syntax validation and token confidence, improving pass@1 by up to 31% on hard tasks and reducing accelerator usage by 58% versus always using the largest model.
-
Defective Task Descriptions in LLM-Based Code Generation: Detection and Analysis
SpecValidator detects lexical vagueness, under-specification, and syntax-formatting defects in LLM code-generation prompts with F1 0.804, outperforming GPT-5-mini and Claude Sonnet 4, and shows that under-specification is the most damaging defect type while richer benchmarks are more resilient.
-
Co-Located Tests, Better AI Code: How Test Syntax Structure Affects Foundation Model Code Generation
Co-locating tests with implementation code yields substantially higher preservation and correctness in foundation-model-generated programs than separated test syntax.
-
When LLMs Lag Behind: Knowledge Conflicts from Evolving APIs in Code Generation
LLMs produce executable code only 42.55% of the time under API evolution without full documentation, improving to 66.36% with structured docs and by 11% more with reasoning strategies, yet outdated patterns persist.
-
REAgent: Requirement-Driven LLM Agents for Software Issue Resolution
REAgent improves LLM patch generation for software issues by 17.4% on average through automated construction, quality checking, and iterative refinement of structured issue-oriented requirements.
-
StarCoder 2 and The Stack v2: The Next Generation
StarCoder2-15B matches or beats CodeLlama-34B on code tasks despite being smaller, and StarCoder2-3B outperforms prior 15B models, with open weights and exact training data identifiers released.
-
A meta-analysis of the effect of generative AI on productivity and learning in programming
Meta-analysis of 23 studies shows moderate productivity gains from GenAI coding assistants (Hedges' g=0.33) but no significant effect on learning (g=0.14).
-
The Productivity-Reliability Paradox: Specification-Driven Governance for AI-Augmented Software Development
The Productivity-Reliability Paradox arises because AI code generators produce variable output while developers lack sufficient specification discipline, making governance models focused on specifications the binding constraint rather than model improvements.
-
Agentic AI in the Software Development Lifecycle: Architecture, Empirical Evidence, and the Reshaping of Software Engineering
Agentic AI systems are shifting software engineering from line-level code generation to delegated repository-scale execution under supervision, with SWE-bench performance rising from 1.96% to 78.4% and productivity gains of 13.6-55.8%.
-
More Is Different: Toward a Theory of Emergence in AI-Native Software Ecosystems
AI-native software ecosystems exhibit emergent behaviors best explained by complex adaptive systems theory, requiring new ecosystem-level monitoring and seven testable propositions that may extend or replace Lehman's laws.
-
Scaling Human-AI Coding Collaboration Requires a Governable Consensus Layer
Agentic Consensus replaces code as the main artifact with a typed property graph world model that maintains commitments and evidence through synchronization operators, shifting evaluation to alignment fidelity and consensus entropy.
-
Sema Code: Decoupling AI Coding Agents into Programmable, Embeddable Infrastructure
Sema Code decouples AI coding agents into a programmable npm library with eight mechanisms for isolation, queuing, compression, scheduling, permissions, and integration.
-
The AI Codebase Maturity Model: From Assisted Coding to Fully Autonomous Systems
The AI Codebase Maturity Model defines six sequential levels of AI-driven development based on feedback loop topologies, validated by experience reports showing 5x PR and 37x issue throughput gains from level 2 to level 6.
-
Accountable Agents in Software Engineering: An Analysis of Terms of Service and a Research Roadmap
Comparative review of AI coding tool ToS shows responsibility for code quality and compliance shifted to users, with policy misalignment for autonomous agents, plus a research roadmap.
-
Recommendations for Efficient and Responsible LLM Adoption within Industrial Software Development
A multi-case study plus survey produces seven actionable recommendations for efficient and responsible LLM use in industrial software engineering.
-
Carbon-Taxed Transformers: A Green Compression Pipeline for Overgrown Language Models
CTT is a compression pipeline for LLMs that achieves up to 49x memory reduction, 10x faster inference, 81% lower CO2 emissions, and retains 68-98% accuracy on code clone detection, summarization, and generation tasks.
-
AI Observability for Developer Productivity Tools: Bridging Cost Awareness and Code Quality
A combined observability platform for AI developer tools achieves under 2% cost variance from actual billing and speeds up usage insights by an order of magnitude through real token tracking and analytics over a six-month workflow.
-
Building an Internal Coding Agent at Zup: Lessons and Open Questions
Engineering choices for tools, safety guardrails, and human oversight determine whether an internal coding agent delivers value in practice more than the underlying model quality.