arxiv: 2511.02399 · v2 · submitted 2025-11-04 · 💻 cs.SE · cs.AI

EvoDev: An Iterative Feature-Driven Framework for End-to-End Software Development with LLM-based Agents

Junwei Liu , Chen Xu , Chong Wang , Tong Bai , Weitong Chen , Kaseng Wong , Yiling Lou , Xin Peng This is my paper

Pith reviewed 2026-05-18 01:37 UTC · model grok-4.3

classification 💻 cs.SE cs.AI

keywords LLM agentssoftware developmentfeature-driven developmentiterative frameworkdependency modelingcontext propagationAndroid development

0 comments

The pith

EvoDev's Feature Map models feature dependencies and propagates context to let LLM agents outperform linear baselines by 56.8 percent on Android tasks.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper proposes EvoDev, an iterative framework that decomposes requirements into features connected by a directed acyclic graph called the Feature Map. This graph carries multi-level details such as business logic, design, and code, which are passed along dependency links to give each development step richer context. The authors test the approach on demanding Android projects and report clear gains over existing methods. A sympathetic reader would care because real software work involves repeated revisions and interdependencies that simple sequential pipelines fail to capture.

Core claim

EvoDev decomposes user requirements into a set of user-valued features and constructs a Feature Map, a directed acyclic graph that explicitly models dependencies between features. Each node in the feature map maintains multi-level information, including business logic, design, and code, which is propagated along dependencies to provide context for subsequent development iterations. Evaluation on challenging Android development tasks shows EvoDev outperforming the best baseline by 56.8 percent while raising single-agent results by 16.0 to 76.6 percent across base LLMs.

What carries the argument

The Feature Map, a directed acyclic graph of features that stores and propagates multi-level information along dependency edges to supply context during iterative development steps.

Load-bearing premise

The reported gains come mainly from the Feature Map's dependency modeling and context propagation rather than from unexamined choices in prompting, metrics, or which tasks were selected.

What would settle it

A controlled test that runs identical LLM agents on the same Android tasks once with the Feature Map enabled and once without it, then checks whether the performance difference disappears.

Figures

Figures reproduced from arXiv: 2511.02399 by Chen Xu, Chong Wang, Junwei Liu, Kaseng Wong, Tong Bai, Weitong Chen, Xin Peng, Yiling Lou.

**Figure 1.** Figure 1: Overview of the FDD-inspired EvoDev framework to extract a list of features, which are defined as user-valued functionalities that can be implemented within two weeks. The feature can also be integrated into feature sets, which contain a list of functionally cohesive features that can be treated as a whole. The next step is to plan by feature, where team members carefully consider the dependencies and pr… view at source ↗

**Figure 2.** Figure 2: The basic activities of Feature Driven Development [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗

**Figure 3.** Figure 3: The requirement document for the countdown timer APP Next, we introduce an Architect agent to construct an overall design of the target application. In FDD, the overall model refers to the domain object model. However, our preliminary experiments indicate that LLMs struggle to generate a usable domain object model, which is also reported in prior work [12, 13, 19]. Therefore, we opt to leverage LLMs for da… view at source ↗

**Figure 5.** Figure 5: The feature list for the countdown timer APP. The [PITH_FULL_IMAGE:figures/full_fig_p004_5.png] view at source ↗

**Figure 4.** Figure 4: The overall design (including both UI and data) for [PITH_FULL_IMAGE:figures/full_fig_p004_4.png] view at source ↗

**Figure 6.** Figure 6: The feature map for the countdown timer APP, with each node containing contexts of the business, design, and [PITH_FULL_IMAGE:figures/full_fig_p006_6.png] view at source ↗

read the original abstract

Recent advances in large language model agents offer the promise of automating end-to-end software development from natural language requirements. However, existing approaches largely adopt linear, waterfall-style pipelines, which oversimplify the iterative nature of real-world development and struggle with complex, large-scale projects. To address these limitations, we propose EvoDev, an iterative software development framework inspired by feature-driven development. EvoDev decomposes user requirements into a set of user-valued features and constructs a Feature Map, a directed acyclic graph that explicitly models dependencies between features. Each node in the feature map maintains multi-level information, including business logic, design, and code, which is propagated along dependencies to provide context for subsequent development iterations. We evaluate EvoDev on challenging Android development tasks and show that it outperforms the best-performing baseline, Claude Code, by a substantial margin of 56.8%, while improving single-agent performance by 16.0%-76.6% across different base LLMs, highlighting the importance of dependency modeling, context propagation, and workflow-aware agent design for complex software projects. Our work summarizes practical insights for designing iterative, LLM-driven development frameworks and informs future training of base LLMs to better support iterative software development.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

EvoDev adds a DAG Feature Map with multi-level context propagation to handle dependencies in LLM agent workflows for software dev, but the performance numbers need ablations to show what actually drives the gains.

read the letter

The main point is that EvoDev turns requirements into a directed acyclic graph of features and propagates business logic, design, and code context along the dependency edges to support iterative agent work. This framing moves past the linear pipelines common in earlier agent papers and tries to match how real projects handle interrelated pieces. The Android task results and the lifts over Claude Code plus the single-agent improvements across models show the overall loop can produce usable output on non-trivial examples. The practical insights section also pulls together some direct advice on workflow design that others could apply. The soft spot sits in the evidence for why the gains happen. The abstract reports a 56.8 percent margin and the 16 to 76.6 percent single-agent range but gives no breakdown of baselines, metrics, statistical checks, or ablations that would turn off only the dependency edges and propagation while leaving iteration and agent roles intact. Without those controls it remains unclear whether the Feature Map itself accounts for the difference or whether prompt volume, call count, or task selection plays a larger role. The stress-test note on missing attribution checks matches what the abstract shows. This paper is for people building or studying multi-agent systems aimed at end-to-end coding tasks. A reader working on context management or dependency handling in agent frameworks would find the concrete mechanisms worth examining even before the numbers receive tighter validation. I would send it to peer review. The proposal has enough structure and addresses a genuine limitation in current approaches, so referees can press on the experimental design and reproducibility rather than rejecting it outright.

Referee Report

2 major / 2 minor

Summary. The paper introduces EvoDev, an iterative framework for end-to-end software development with LLM-based agents. It decomposes natural language requirements into user-valued features, constructs a directed acyclic graph (Feature Map) to explicitly model inter-feature dependencies, and propagates multi-level context (business logic, design, and code) along dependency edges to inform subsequent iterations. The central empirical claim is that EvoDev outperforms the strongest baseline (Claude Code) by 56.8% and improves single-agent performance by 16.0%-76.6% across base LLMs when evaluated on challenging Android development tasks.

Significance. If the reported gains are shown to arise specifically from the DAG-based dependency modeling and context propagation rather than from unstated prompting or iteration details, the work would be significant for shifting LLM-agent research away from linear waterfall pipelines toward more realistic iterative, dependency-aware workflows. It supplies practical design insights and could inform future LLM fine-tuning for software engineering.

major comments (2)

[Abstract and §4] Abstract and §4 (Evaluation): The central performance claims (56.8% margin over Claude Code; 16.0%-76.6% single-agent lifts) are stated without any description of the evaluation metrics, task selection criteria, number of trials, baseline implementations, statistical tests, or error analysis. This absence is load-bearing because the attribution of gains to the Feature Map cannot be assessed without these controls.
[§4] §4 (Experiments): No ablation is reported that disables only the dependency edges and multi-level context propagation while preserving the iterative loop, agent roles, and feature decomposition. Without this isolation, it remains possible that the observed margins stem from differences in total LLM calls, prompt length, or task curation rather than the claimed DAG modeling.

minor comments (2)

[§2] §2 (Related Work): The positioning against prior feature-driven development literature and existing LLM-agent frameworks (e.g., those using planning or reflection) would benefit from a more explicit comparison table.
[§3] Notation: The multi-level information maintained at each Feature Map node is described in prose but never formalized (e.g., as a tuple or record type), which reduces clarity when discussing propagation.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback, which helps strengthen the clarity and rigor of our evaluation section. We address each major comment below and will revise the manuscript to incorporate the requested details and experiments.

read point-by-point responses

Referee: [Abstract and §4] Abstract and §4 (Evaluation): The central performance claims (56.8% margin over Claude Code; 16.0%-76.6% single-agent lifts) are stated without any description of the evaluation metrics, task selection criteria, number of trials, baseline implementations, statistical tests, or error analysis. This absence is load-bearing because the attribution of gains to the Feature Map cannot be assessed without these controls.

Authors: We agree that the evaluation protocol requires more explicit and prominent description to support attribution of gains to the Feature Map. While §4 of the manuscript outlines the Android development tasks and overall setup, we acknowledge the need for greater detail on metrics, controls, and analysis. In the revised manuscript we will expand both the abstract and §4 to specify: the primary metric (feature-level task completion rate combining business-logic correctness and executable code), secondary metrics (design consistency and dependency resolution success), task selection criteria (Android apps requiring 5–12 interdependent features drawn from real-world requirement sets), number of trials (five independent runs per condition with reported standard deviation), baseline implementations (exact prompting and iteration limits used for Claude Code and other agents), statistical tests (paired t-tests with p-values), and error analysis (categorization of failures such as missed dependencies versus implementation bugs). These additions will make the evidence for the DAG-based context propagation more transparent. revision: yes
Referee: [§4] §4 (Experiments): No ablation is reported that disables only the dependency edges and multi-level context propagation while preserving the iterative loop, agent roles, and feature decomposition. Without this isolation, it remains possible that the observed margins stem from differences in total LLM calls, prompt length, or task curation rather than the claimed DAG modeling.

Authors: The referee correctly notes that our existing comparisons are against external baselines lacking the full EvoDev pipeline, which does not isolate the contribution of the dependency edges and context propagation. We will add a targeted ablation in the revised §4: a control variant that retains the iterative loop, agent roles, and feature decomposition but replaces the DAG with either independent feature processing or a fixed linear order, while matching total LLM calls and prompt lengths as closely as possible. Performance differences between this ablation and the full EvoDev will be reported to demonstrate that the observed margins arise specifically from dependency-aware multi-level context propagation rather than iteration count or task selection alone. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical framework evaluation with no self-referential derivations

full rationale

The paper presents EvoDev as an iterative framework that decomposes requirements into features, builds a Feature Map DAG, and propagates multi-level context along dependencies. All central claims concern measured performance lifts (56.8% over Claude Code, 16.0%-76.6% single-agent gains) obtained from direct experimental comparison on Android tasks. No equations, first-principles derivations, fitted parameters renamed as predictions, or self-citation chains appear in the abstract or described framework. The reported results are therefore independent of the inputs by construction and rest on external benchmarks rather than any reduction to the framework's own definitions or prior author work.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

The central claim rests on the untested assumption that structured dependency modeling and context passing will reliably improve LLM agent outputs on complex tasks; the Feature Map is introduced as a new construct without external validation.

axioms (1)

domain assumption LLM agents benefit substantially from explicit dependency modeling and context propagation when performing iterative software development
Invoked to explain why the Feature Map produces the reported gains over linear baselines.

invented entities (1)

Feature Map no independent evidence
purpose: Directed acyclic graph that stores and propagates multi-level information (business logic, design, code) across feature dependencies
Newly defined structure central to the iterative workflow; no independent evidence provided outside the paper's evaluation.

pith-pipeline@v0.9.0 · 5763 in / 1323 out tokens · 35808 ms · 2026-05-18T01:37:20.673222+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Foundation/RealityFromDistinction.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

EvoDev decomposes user requirements into a set of user-valued features and constructs a Feature Map, a directed acyclic graph that explicitly models dependencies between features. Each node in the feature map maintains multi-level information, including business logic, design, and code, which is propagated along dependencies
IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

We evaluate EvoDev on challenging Android development tasks and show that it outperforms the best-performing baseline, Claude Code, by a substantial margin of 56.8%

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Large Language Model-Based Agents for Software Engineering: A Survey
cs.SE 2024-09 unverdicted novelty 4.0

A literature survey that collects and categorizes 124 papers on LLM-based agents for software engineering from SE and agent perspectives.

Reference graph

Works this paper leans on

44 extracted references · 44 canonical work pages · cited by 1 Pith paper · 1 internal anchor

[1]

2024.Android Operating System

Andi Fitriah Abdul Kadir, Arash Habibi Lashkari, and Mahdi Daghmehchi Firooz- jaei. 2024.Android Operating System. Springer Nature Switzerland, Cham, 25–42. doi:10.1007/978-3-031-48865-8_2

work page doi:10.1007/978-3-031-48865-8_2 2024
[2]

Azat Abdullin, Pouria Derakhshanfar, and Annibale Panichella. 2025. Test Wars: A Comparative Study of SBST, Symbolic Execution, and LLM-Based Approaches to Unit Test Generation. In2025 IEEE Conference on Software Testing, Verification and Validation (ICST). IEEE, 221–232

work page 2025
[3]

Yasaman Abedini and Abbas Heydarnoori. 2025. Leveraging Large Language Models for Classifying App Users’ Feedback.arXiv preprint arXiv:2507.08250 (2025)

work page arXiv 2025
[4]

S Akilesh, Rajeev Sekar, Om Kumar CU, D Prakalya, and M Suguna. 2025. Multi- Agent hierarchical workflow for autonomous code generation with Large Lan- guage Models. In2025 IEEE International Students’ Conference on Electrical, Elec- tronics and Computer Science (SCEECS). IEEE, 1–5

work page 2025
[5]

as Qwen series is developed by Alibaba Cloud) Alibaba Cloud (implied. 2024. Qwen3 Coder - Agentic Coding Adventure. Web page. https://qwen3lm.com/

work page 2024
[6]

Anthropic. 2024. Introducing computer use, a new Claude 3.5 Sonnet, and Claude 3.5 Haiku. Web page. https://www.anthropic.com/news/3-5-models- and-computer-use

work page 2024
[7]

2025.Claude Code

Anthropic. 2025.Claude Code. https://www.anthropic.com/claude-code

work page 2025
[8]

Anthropic. 2025. Claude Sonnet 4. Web page. https://www.anthropic.com/ claude/sonnet

work page 2025
[9]

2025.GPT-Engineer

AntonOsika. 2025.GPT-Engineer. https://github.com/AntonOsika/gpt-engineer

work page 2025
[10]

Benoit Baudry, Khashayar Etemadi, Sen Fang, Yogya Gamage, Yi Liu, Yuxin Liu, Martin Monperrus, Javier Ron, André Silva, and Deepika Tiwari. 2024. Generative AI to generate test data generators.IEEE Software41, 6 (2024), 55–64

work page 2024
[11]

M Bialy, V Pantelic, J Jaskolka, A Schaap, L Patcas, M Lawford, and A Wassyng

work page
[12]

In Handbook of system safety and security

Software engineering for model-based development by domain experts. In Handbook of system safety and security. Elsevier, 39–64

work page
[13]

Kua Chen, Yujing Yang, Boqi Chen, José Antonio Hernández López, Gunter Mussbacher, and Dániel Varró. 2023. Automated domain modeling with large language models: A comparative study. In2023 ACM/IEEE 26th International Conference on Model Driven Engineering Languages and Systems (MODELS). IEEE, 162–172

work page 2023
[14]

Ru Chen, Jingwei Shen, and Xiao He. 2024. A Model Is Not Built By A Single Prompt: LLM-Based Domain Modeling With Question Decomposition.CoRR abs/2410.09854 (2024). arXiv:2410.09854 doi:10.48550/ARXIV.2410.09854

work page doi:10.48550/arxiv.2410.09854 2024
[15]

Matteo Ciniselli, Alberto Martin-Lopez, and Gabriele Bavota. 2024. On the generalizability of deep learning-based code completion across programming language versions. InProceedings of the 32nd IEEE/ACM International Conference on Program Comprehension. 99–111

work page 2024
[16]

Alibaba Cloud. 2024. Tongyi Large Language Models: The First Choice for Enterprises Embracing the AI Era. Web page. https://www.aliyun.com/product/ tongyi

work page 2024
[17]

2017.Research design: Qualitative, quan- titative, and mixed methods approaches

John W Creswell and J David Creswell. 2017.Research design: Qualitative, quan- titative, and mixed methods approaches. Sage publications

work page 2017
[18]

Leuson Da Silva, Jordan Samhi, and Foutse Khomh. 2025. LLMs and Stack Overflow discussions: Reliability, impact, and challenges.Journal of Systems and Software(2025), 112541

work page 2025
[19]

2004.Domain-driven design: tackling complexity in the heart of software

Eric Evans. 2004.Domain-driven design: tackling complexity in the heart of software. Addison-Wesley Professional

work page 2004
[20]

Yingqiang Ge, Wenyue Hua, Kai Mei, Juntao Tan, Shuyuan Xu, Zelong Li, Yongfeng Zhang, et al. 2023. Openagi: When llm meets domain experts.Advances in Neural Information Processing Systems36 (2023), 5539–5568

work page 2023
[21]

Sadhna Goyal. 2007. Agile techniques for project management and software engineering. InMajor Seminar on Feature Driven Development. 1–19

work page 2007
[22]

Renda Han. 2024. Survey: The Evolution and Future of Android Software Devel- opment.Deep Learning and Pattern Recognition1, 1 (2024)

work page 2024
[23]

Stefanus A Haryono, Ferdian Thung, David Lo, Lingxiao Jiang, Julia Lawall, Hong Jin Kang, Lucas Serrano, and Gilles Muller. 2021. Androevolve: Automated update for android deprecated-api usages. In2021 IEEE/ACM 43rd International Conference on Software Engineering: Companion Proceedings (ICSE-Companion). IEEE, 1–4

work page 2021
[24]

Sirui Hong, Mingchen Zhuge, Jonathan Chen, Xiawu Zheng, Yuheng Cheng, Jinlin Wang, Ceyao Zhang, Zili Wang, Steven Ka Shing Yau, Zijuan Lin, Liyang Zhou, Chenyu Ran, Lingfeng Xiao, Chenglin Wu, and Jürgen Schmidhuber. 2024. MetaGPT: Meta Programming for A Multi-Agent Collaborative Framework. In The Twelfth International Conference on Learning Representatio...

work page 2024
[25]

Yue Hu, Yuzhu Cai, Yaxin Du, Xinyu Zhu, Xiangrui Liu, Zijie Yu, Yuchen Hou, Shuo Tang, and Siheng Chen. 2025. Self-Evolving Multi-Agent Collaboration Net- works for Software Development. InThe Thirteenth International Conference on Learning Representations, ICLR 2025, Singapore, April 24-28, 2025. OpenReview.net. https://openreview.net/forum?id=4R71pdPBZp

work page 2025
[26]

LangChain Inc. 2024. LangGraph: Balance Agent Control with Agency. Web page. https://www.langchain.com/langgraph

work page 2024
[27]

Hong Yi Lin, Chunhua Liu, Haoyu Gao, Patanamon Thongtanunam, and Christoph Treude. 2025. CodeReviewQA: The Code Review Comprehension Assessment for Large Language Models.arXiv preprint arXiv:2503.16167(2025)

work page arXiv 2025
[28]

Junwei Liu, Yixuan Chen, Mingwei Liu, Xin Peng, and Yiling Lou. 2024. STALL+: Boosting LLM-based Repository-level Code Completion with Static Analysis. CoRRabs/2406.10018 (2024). arXiv:2406.10018 doi:10.48550/ARXIV.2406.10018

work page doi:10.48550/arxiv.2406.10018 2024
[29]

Jie Liu, Guohua Wang, Ronghui Yang, Mengchen Zhao, and Yi Cai. [n. d.]. AltDev: Achieving Real-Time Alignment in Multi-Agent Software Development. ([n. d.])

work page
[30]

Junwei Liu, Kaixin Wang, Yixuan Chen, Xin Peng, Zhenpeng Chen, Lingming Zhang, and Yiling Lou. 2024. Large Language Model-Based Agents for Software Engineering: A Survey.CoRRabs/2409.02977 (2024). arXiv:2409.02977 doi:10. 48550/ARXIV.2409.02977

work page internal anchor Pith review Pith/arXiv arXiv 2024
[31]

2025.Lovable

Lovable. 2025.Lovable. https://lovable.dev/

work page 2025
[32]

Shriraj Mandulapalli, Emilio Hernandez, Wayne Jordan Hall, Alireza Chakeri, and Luis Jaimes. 2025. Development of Agentic Workflows with LangGraph for Software Development Life Cycle Automation. InNorth American Conference on Industrial Engineering and Operations Management-Computer Science Tracks. Springer, 45–54

work page 2025
[33]

Nguyen, and Nghi D

Minh Huynh Nguyen, Thang Phan Chau, Phong X. Nguyen, and Nghi D. Q. Bui

work page
[34]

Lyu, Caiming Xiong, Silvio Savarese, and Doyen Sahoo

AgileCoder: Dynamic Collaborative Agents for Software Development based on Agile Methodology. InIEEE/ACM Second International Conference on AI Foundation Models and Software Engineering, Forge@ICSE 2025, Ottawa, ON, Canada, April 27-28, 2025. IEEE, 156–167. doi:10.1109/FORGE66646.2025.00026

work page doi:10.1109/forge66646.2025.00026 2025
[35]

OpenAI. 2025. Introducing GPT-4.1 in the API. Web page. https://openai.com/ index/gpt-4-1/

work page 2025
[36]

Kai Petersen, Claes Wohlin, and Dejan Baca. 2009. The waterfall model in large- scale development. InInternational Conference on Product-Focused Software Process Improvement. Springer, 386–400

work page 2009
[37]

Chen Qian, Wei Liu, Hongzhang Liu, Nuo Chen, Yufan Dang, Jiahao Li, Cheng Yang, Weize Chen, Yusheng Su, Xin Cong, Juyuan Xu, Dahai Li, Zhiyuan Liu, and Maosong Sun. 2024. ChatDev: Communicative Agents for Software Development. InProceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), ACL 2024, Bangk...

work page doi:10.18653/v1/2024.acl-long.810 2024
[38]

John Robinson. 2024. Likert scale. InEncyclopedia of quality of life and well-being research. Springer, 3917–3918

work page 2024
[39]

Adriana Sejfia, Satyaki Das, Saad Shafiq, and Nenad Medvidović. 2024. Toward improved deep learning-based vulnerability detection. InProceedings of the 46th IEEE/ACM international conference on software engineering. 1–12

work page 2024
[40]

Purva Sharma and Jayakumar Kaliappan. 2025. Optimised Intelligent Software Company Management System using Multi-Agent Framework.Grenze Interna- tional Journal of Engineering & Technology (GIJET)11 (2025)

work page 2025
[41]

Syed Tauhid Ullah Shah, Mohammad Hussein, Ann Barcomb, and Mohammad Moshirpour. 2025. Explainability as a Compliance Requirement: What Regulated Industries Need from AI Tools for Design Artifact Generation.arXiv e-prints (2025), arXiv–2507

work page 2025
[42]

Sergey Titov, Mikhail Evtikhiev, Anton Shapkin, Oleg Smirnov, Sergei Boytsov, Dariia Karaeva, Maksim Sheptyakov, Mikhail Arkhipov, Timofey Bryksin, and Egor Bogomolov. 2024. Kotlin ML Pack: Technical Report.CoRRabs/2405.19250 Conference acronym ’XX, June 03–05, 2018, Woodstock, NY Liu et al. (2024). arXiv:2405.19250 doi:10.48550/ARXIV.2405.19250

work page doi:10.48550/arxiv.2405.19250 2024
[43]

Jim Whitehead. 2007. Collaboration in software engineering: A roadmap. In Future of Software Engineering (FOSE’07). IEEE, 214–225

work page 2007
[44]

Simiao Zhang, Jiaping Wang, Guoliang Dong, Jun Sun, Yueling Zhang, and Geguang Pu. 2024. Experimenting a New Programming Practice with LLMs. CoRRabs/2401.01062 (2024). arXiv:2401.01062 doi:10.48550/ARXIV.2401.01062 Received 20 February 2007; revised 12 March 2009; accepted 5 June 2009

work page doi:10.48550/arxiv.2401.01062 2024