Microskill Architecture: A Modular Skill-Driven Framework for AI-Native Code Generation

Mohammad Zare; Omid Abdolrahmani

arxiv: 2606.05720 · v1 · pith:7XNFGVWVnew · submitted 2026-06-04 · 💻 cs.SE · cs.AI

Microskill Architecture: A Modular Skill-Driven Framework for AI-Native Code Generation

Mohammad Zare , Omid Abdolrahmani This is my paper

Pith reviewed 2026-06-28 00:32 UTC · model grok-4.3

classification 💻 cs.SE cs.AI

keywords MicroSkill ArchitectureAI code generationcontext managementskill capsulesdynamic routingtoken optimizationmodular knowledgeself-learning mechanism

0 comments

The pith

MicroSkill Architecture partitions project knowledge into atomic skill capsules selected by a dynamic router, cutting token use by over 90% and nearly doubling first-try compilation success in AI code generation.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces MicroSkill Architecture as a modular framework that breaks codebase knowledge into small, focused skill capsules rather than loading full project context into language models. A dynamic router then chooses only the semantically relevant capsules for each generation task, framed as constrained optimization under a token budget. An enterprise case study with fifteen complex features demonstrates over 90 percent token reduction, nearly doubled first-try success rates, zero architectural violations, and autonomous discovery of seven new capsules through self-learning. A reader would care because full-context approaches cause high costs, lost information, and drift, while this offers a way to maintain accuracy at scale.

Core claim

By partitioning knowledge into atomic, sharply scoped skill capsules and routing only relevant subsets under token constraints, the architecture enables efficient, reliable AI-native code generation that avoids mid-sequence loss and architectural drift, as shown by the measured gains in the fifteen-feature enterprise system and the self-learning extraction of additional capsules.

What carries the argument

MicroSkill Architecture, which partitions project knowledge into atomic skill capsules and uses a dynamic router to select semantically relevant subsets subject to a token budget.

If this is right

Token consumption drops by over 90 percent relative to full codebase injection.
First-try compilation success rates nearly double.
Architectural violations are eliminated entirely.
The system autonomously extracts and registers new skill capsules through self-learning.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same partitioning and routing logic could apply to non-code tasks such as automated documentation or test generation where context limits matter.
If capsules prove reusable across unrelated projects, organizations could maintain shared skill libraries that reduce per-project setup costs.
Scaling the approach to codebases an order of magnitude larger would test whether the atomic-partition assumption remains stable.
Embedding the router inside existing IDEs would let developers invoke the system without manually managing context.

Load-bearing premise

Project knowledge can be reliably partitioned into atomic, non-overlapping skill capsules such that a dynamic router can select a subset that preserves all necessary information without omissions or inconsistencies.

What would settle it

A controlled test on the same enterprise system where the router omits a required capsule for one feature, producing code that compiles but introduces an architectural violation or functional omission.

read the original abstract

Large language models and AI coding agents have reshaped software development, but the path to fully AI-native systems faces structural challenges. Chief among them is managing context windows without losing accuracy or efficiency. When developers inject full project documentation and code into a model's memory, the model loses mid-sequence information, token costs spiral, and architecture drifts. This paper presents MicroSkill Architecture: a modular design paradigm inspired by microservices, applied to knowledge encapsulation instead of service decomposition. Instead of feeding an agent the entire codebase, the architecture partitions knowledge into atomic, sharply scoped skill capsules, and a dynamic router selects only semantically relevant capsules for the task. We formally model context allocation as constrained optimization over semantic relevance subject to a token budget. An empirical case study an enterprise content management system with fifteen complex features shows that MicroSkill cuts token consumption by over 90%, nearly doubles first-try compilation success rates, eliminates architectural violations entirely, and enables autonomous extraction and registration of seven new skill capsules via a self-learning mechanism. These findings suggest MicroSkill Architecture offers a scalable foundation for building AI-native development systems that are more efficient, more reliable, and capable of evolving over time.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The abstract's big claims on token cuts and reliability rest on an unvalidated assumption that project knowledge splits cleanly into atomic non-overlapping capsules a router can always pick without omissions.

read the letter

The main thing to know is that this paper frames context management for AI code generation as a microservices-style partitioning of knowledge into skill capsules, with a dynamic router and constrained optimization to stay under token limits, plus a self-learning step that extracts new capsules. The case study on a 15-feature enterprise CMS is presented as showing over 90% token reduction, nearly doubled first-try compilation success, zero architectural violations, and seven new capsules extracted autonomously.

What is actually new is the explicit microservices analogy for knowledge encapsulation and the self-learning registration mechanism. The formal model of context allocation as constrained optimization over semantic relevance is a clear step beyond loose retrieval ideas. The paper does a solid job naming the real scaling problem with full-project context in LLMs.

The soft spots are in the evidence and validation. All headline numbers come from one case study with no methods, no data tables, no error analysis, and no baseline comparisons in the abstract. The optimization formulation is mentioned but not shown, so there is no way to check whether the router selections were information-complete. The central assumption—that capsules can be made atomic and non-overlapping and that the router never drops something later needed—remains untested in what is provided. If that assumption fails on even one feature, the reported gains become case-specific rather than general support for the architecture.

This is for researchers working on modular RAG or agent systems for code generation who want a structured framing. A reader looking for reproducible techniques or rigorous evaluation will find the current version thin. It deserves peer review so the full paper can be examined for the missing formulation, experiments, and comparisons; the idea is worth checking if the details hold up.

Referee Report

3 major / 1 minor

Summary. The manuscript introduces MicroSkill Architecture, a modular framework for AI-native code generation inspired by microservices. Project knowledge is partitioned into atomic skill capsules; a dynamic router selects semantically relevant subsets under a token budget, which is formally modeled as constrained optimization over semantic relevance. An empirical case study on an enterprise content management system with fifteen complex features reports that the approach cuts token consumption by over 90%, nearly doubles first-try compilation success rates, eliminates architectural violations, and enables autonomous extraction and registration of seven new skill capsules via a self-learning mechanism.

Significance. If the empirical results and underlying model are substantiated with full methodological detail, the work could meaningfully advance context management in LLM-based code generation by reducing token costs and improving reliability while adding a self-evolution capability. The constrained-optimization framing and modular encapsulation align with software-engineering principles and could inform scalable AI development tools.

major comments (3)

[Abstract] Abstract: The manuscript states that context allocation is modeled as constrained optimization over semantic relevance subject to a token budget, yet neither the objective function, decision variables, constraints, nor solution method are specified. This omission is load-bearing for the central token-reduction and correctness claims.
[Case Study] Case study description: No information is supplied on the initial partitioning procedure into skill capsules, the router's selection algorithm, any completeness checks, or statistical details (baselines, trial counts, variance). Without these, the reported 90%+ token savings, doubled success rates, and zero violations cannot be evaluated for reproducibility or sensitivity to the partitioning assumption.
[Abstract] Abstract and architecture description: The strongest empirical outcomes presuppose that knowledge can be partitioned into atomic, non-overlapping capsules whose union the router can always recover without omissions; no validation, sensitivity analysis, or counter-example testing of this assumption is described, leaving the headline metrics dependent on an untested precondition.

minor comments (1)

[Abstract] Abstract: The sentence 'An empirical case study an enterprise content management system' is missing the preposition 'on'.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the insightful comments. We address each major point below and will revise the manuscript accordingly to improve methodological transparency and reproducibility.

read point-by-point responses

Referee: [Abstract] Abstract: The manuscript states that context allocation is modeled as constrained optimization over semantic relevance subject to a token budget, yet neither the objective function, decision variables, constraints, nor solution method are specified. This omission is load-bearing for the central token-reduction and correctness claims.

Authors: We agree that the formal details are required to substantiate the claims. The revised manuscript will add an explicit mathematical section defining the objective function (maximizing aggregate semantic relevance), decision variables (binary inclusion indicators for each capsule), constraints (token budget and non-overlap), and the solution procedure (e.g., integer linear programming or greedy approximation). revision: yes
Referee: [Case Study] Case study description: No information is supplied on the initial partitioning procedure into skill capsules, the router's selection algorithm, any completeness checks, or statistical details (baselines, trial counts, variance). Without these, the reported 90%+ token savings, doubled success rates, and zero violations cannot be evaluated for reproducibility or sensitivity to the partitioning assumption.

Authors: We acknowledge the current description lacks these elements. The revision will document the initial partitioning method used on the enterprise codebase, the router's semantic selection algorithm, completeness verification steps, and statistical details including baselines, trial counts, and variance to support reproducibility assessment. revision: yes
Referee: [Abstract] Abstract and architecture description: The strongest empirical outcomes presuppose that knowledge can be partitioned into atomic, non-overlapping capsules whose union the router can always recover without omissions; no validation, sensitivity analysis, or counter-example testing of this assumption is described, leaving the headline metrics dependent on an untested precondition.

Authors: The atomic non-overlapping partition is a foundational assumption. We will add a dedicated subsection that reports any validation performed in the case study, includes sensitivity analysis on capsule granularity where data permit, and discusses limitations or potential counter-examples to the assumption. revision: yes

Circularity Check

0 steps flagged

No circularity; derivation is self-contained empirical architecture

full rationale

The paper states a modular partitioning into skill capsules, a dynamic router, and a constrained optimization model for token-budgeted context allocation, then reports empirical outcomes (token reduction, success rates, zero violations, autonomous capsule extraction) on one 15-feature CMS case study. No equations, self-definitions, fitted-input predictions, or self-citation chains appear in the provided text that would reduce any claimed result to its own inputs by construction. The optimization is presented as a modeling choice rather than a tautology, and the self-learning mechanism is described as an observed capability without circular formulation. This is the normal case of an empirical framework whose central claims rest on external validation rather than definitional reduction.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 1 invented entities

Abstract-only review provides no explicit free parameters, axioms, or independent evidence for invented entities; the skill capsule concept is introduced but cannot be audited for supporting evidence or derivation.

invented entities (1)

skill capsules no independent evidence
purpose: Atomic, sharply scoped units of project knowledge for modular selection
Core building block of the architecture introduced in the abstract with no independent evidence or falsifiable handle provided.

pith-pipeline@v0.9.1-grok · 5732 in / 1205 out tokens · 39777 ms · 2026-06-28T00:32:59.250332+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

27 extracted references · 8 canonical work pages · 3 internal anchors

[1]

Lost in the Middle: How Language Models Use Long Contexts,

N. F. Liu, K. Lin, J. Hewitt, A. Paranjape, M. Bevilacqua, F. Petroni, and P. Liang, “Lost in the Middle: How Language Models Use Long Contexts,” Transactions of the Association for Computational Linguistics, vol. 12, pp. 157–173, 2024

2024
[2]

AI-Generated Smells: An Analysis of Code and Architecture in LLM and Agent-Driven Development

Y. C. Zhu, N. Tsantalis, and P. C. Rigby, “AI -Generated Smells: An Analysis of Code and Architecture in LLM and Agent-Driven Development,” arXiv preprint arXiv:2605.02741, 2026

work page internal anchor Pith review Pith/arXiv arXiv 2026
[3]

Are We SOLID Yet? An Empirical Study on Prompting LLMs to Detect Desig n Principle Violations,

F. Pehlivan, A. Ü. Ergüzen, S. M. Yengejeh, M. Lami, and A. Koyuncu, “Are We SOLID Yet? An Empirical Study on Prompting LLMs to Detect Desig n Principle Violations,” Proceedings of the 40th IEEE/ACM International Conference on Automated Software Engineering (ASE) , pp. 3958–3970, 2025

2025
[4]

B., Davis, E

Angenent, S. B., Davis, E. P., DeCleene, E., Ellingson, P., Feng, Z., Gevorgyan, E., ... & Zhou, Y. (2024). Which shapes can appear in a curve shorteni ng flow singularity?. Nonlinearity, 37(12), 125003

2024
[5]

MetaGPT: Meta Programming for a Multi-Agent Collaborative Framework,

S. Hong, M. Zhuge, J. Chen, X. Zheng, Y. Cheng, C. Zhang, and C. Wu, “MetaGPT: Meta Programming for a Multi-Agent Collaborative Framework,” Proceedings of the International Conference on Learning Representations (ICLR), 2024

2024
[6]

ChatDev: Communicative Agents for Software Development,

C. Qian, X. Cong, C. Yang, W. Chen, Y. Su, J. Xu, and M. Sun, “ChatDev: Communicative Agents for Software Development,” Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (ACL) , pp. 15174–15186, 2024

2024
[7]

Self -Collaboration Code Generation via ChatGPT,

Y. Dong, X. Jiang, Z. Jin, and G. Li, “Self -Collaboration Code Generation via ChatGPT,” ACM Transactions on Software Engineering and Methodology, vol. 32, no. 3, pp. 1– 38, 2023

2023
[8]

SWE-agent: Agent-Computer Interfaces Enable Automated Software Engineering,

J. Yang, C. E. Jimenez, A. Wettig, K. Lieret, J. K. Mitchell, K. Narasimhan, and O. Press, “SWE-agent: Agent-Computer Interfaces Enable Automated Software Engineering,” Advances in Neural Information Processing Systems (NeurIPS), vol. 37, 2024

2024
[9]

SWE -bench: Can Language Models Resolve Real-World GitHub Issues?

C. E. Jimenez, J. Yang, A. Wettig, H. Trivedi, K. Narasimhan, and O. Press, “SWE -bench: Can Language Models Resolve Real-World GitHub Issues?” Proceedings of the International Conference on Learning Representations (ICLR), 2024

2024
[11]

Live-SWE-agent: Can software engineering agents self-evolve on the fly?

C. S. Xia, Z. Wang, Y. Yang, Y. Wei, and L. Zhang, “Live-SWE-agent: Can Software Engineering Agents Self - Evolve on the Fly?” arXiv preprint arXiv:2511.13646, 2025

work page arXiv 2025
[12]

Voyager: An Open-Ended Embodied Agent with Large Language Models

G. Wang et al., “Voyager: An Open -Ended Embodied Agent with Open-Ended Skills in Minecraft,” arXiv preprint arXiv:2305.16291, 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023
[13]

Chameleon: Plug-and-Play Composition of Mixture-of-Experts for Multi-step Reasoning,

P. Lu et al., “Chameleon: Plug-and-Play Composition of Mixture-of-Experts for Multi-step Reasoning,” arXiv preprint arXiv:2304.09842, 2023

work page arXiv 2023
[14]

RepoCoder: Repository -Level Code Completion Through Iterative Retrieval and Generation,

F. Zhang, B. Chen, Y. Zhang, J. Keung, J. Liu, D. Zan, and W. Chen, “RepoCoder: Repository -Level Code Completion Through Iterative Retrieval and Generation,” Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP), 2023

2023
[15]

Dataflow -Guided Retrieval Augmentation for Repository -Level Code Completion,

H. Cheng, Y. Wu, et al., “Dataflow -Guided Retrieval Augmentation for Repository -Level Code Completion, ” Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (ACL), 2024

2024
[16]

In Line with Context: Repository-Level Code Generation via Context Inlining,

C. Hu, W. Zeng, Y. Shi, B. Shen, and X. Gu, “In Line with Context: Repository-Level Code Generation via Context Inlining,” Proceedings of the ACM We b Conference / FSE 2026, 2026

2026
[17]

Di Wu, W. U. A., Zhang, D., Ramanathan, M. K., & Ma, X. Repoformer: Selective retrieval for repository -level code completion, 2024. URL https://arxiv. org/abs/2403.10059

work page arXiv 2024
[18]

ReCUBE: Evaluating Repository -Level Context Utilization in Code Generation,

J. Hong, B. G. Ascoli, and J. D. Choi, “ReCUBE: Evaluating Repository -Level Context Utilization in Code Generation,” arXiv preprint arXiv:2603.25770, 2026

work page arXiv 2026
[19]

(2025, April)

Wang, Y., Wang, Y., Guo, D., Chen, J., Zhang, R., Ma, Y., & Zheng, Z. (2025, April). Rlcoder: Reinforcement learning for repository -level code completi on. In 2025 IEEE/ACM 47th International Conference on Software Engineering (ICSE) (pp. 1140-1152). IEEE

2025
[20]

Teaching Large Language Models to Self -Debug,

B. Chen, F. Zhang, et al., “Teaching Large Language Models to Self -Debug,” Proceedings of the International Conference on Learning Representations (ICLR), 2024

2024
[21]

Self -Refine: Iterative Refinement with Self -Feedback,

A. Madaan, N. Tandon, P. Gupta, S. Hallinan, L. Gao, S. Wiegreffe, and P. Clark, “Self -Refine: Iterative Refinement with Self -Feedback,” Advances in Neural Information Processing Systems (NeurIPS), 2023

2023
[22]

Adnan, M., Xu, Z., & Kuhn, C. C. (2025). Large language model guided self-debugging code generation. arXiv preprint arXiv:2502.02928

work page arXiv 2025
[23]

Iterative Refinement of Project-Level Code Context for Precise Code Generation with Compiler Feedback,

Z. Bi, Y. Wan, Z. Wang, H. Zhang, B. Guan, F. Lu, and X. Shi, “Iterative Refinement of Project-Level Code Context for Precise Code Generation with Compiler Feedback,” Findings of the Association for Computational Linguistics: ACL 2024, pp. 2336–2353, 2024

2024
[24]

Large Language Models as Tool Makers,

T. Cai, X. Wang, T. Ma, X. Chen, and D. Zhou, “Large Language Models as Tool Makers,” Advances in Neural Information Processing Systems (NeurIPS), 2023

2023
[25]

Toolformer: Language Models Can Teach Themselves to Use Tools,

T. Schick, J. Dwivedi-Yu, R. Dessì, H. Caron, P. Singh, T. Scialom, and E. Grave, “Toolformer: Language Models Can Teach Themselves to Use Tools,” Transactions of the Association for Computational Linguistics, vol. 11, pp. 111– 125, 2023

2023
[26]

Empirical Analysis of Code Smells in LLM-Generated Code,

M. Siddiq et al., “Empirical Analysis of Code Smells in LLM-Generated Code,” Journal of Software: Evolution and Process, 2025

2025
[27]

SmellBench: Evaluating LLM Agents on Architectural Code Smell Repair

SmellBench, “SmellBench: Empirical Evaluation of LLM Agents on Architectural Code Smell Repair,” arXiv preprint arXiv:2605.07001, 2026

work page internal anchor Pith review Pith/arXiv arXiv 2026
[28]

Liu, H., Fuchß, D., Corallo, S., Hummel, M., Keim, J., & Hey, T. (2026). Architecture in the Cradle: Early Warning of Architectural Decay with ArchGuard. In 23rd IEEE International Conference on Software Architecture (ICSA 2026)

2026

[1] [1]

Lost in the Middle: How Language Models Use Long Contexts,

N. F. Liu, K. Lin, J. Hewitt, A. Paranjape, M. Bevilacqua, F. Petroni, and P. Liang, “Lost in the Middle: How Language Models Use Long Contexts,” Transactions of the Association for Computational Linguistics, vol. 12, pp. 157–173, 2024

2024

[2] [2]

AI-Generated Smells: An Analysis of Code and Architecture in LLM and Agent-Driven Development

Y. C. Zhu, N. Tsantalis, and P. C. Rigby, “AI -Generated Smells: An Analysis of Code and Architecture in LLM and Agent-Driven Development,” arXiv preprint arXiv:2605.02741, 2026

work page internal anchor Pith review Pith/arXiv arXiv 2026

[3] [3]

Are We SOLID Yet? An Empirical Study on Prompting LLMs to Detect Desig n Principle Violations,

F. Pehlivan, A. Ü. Ergüzen, S. M. Yengejeh, M. Lami, and A. Koyuncu, “Are We SOLID Yet? An Empirical Study on Prompting LLMs to Detect Desig n Principle Violations,” Proceedings of the 40th IEEE/ACM International Conference on Automated Software Engineering (ASE) , pp. 3958–3970, 2025

2025

[4] [4]

B., Davis, E

Angenent, S. B., Davis, E. P., DeCleene, E., Ellingson, P., Feng, Z., Gevorgyan, E., ... & Zhou, Y. (2024). Which shapes can appear in a curve shorteni ng flow singularity?. Nonlinearity, 37(12), 125003

2024

[5] [5]

MetaGPT: Meta Programming for a Multi-Agent Collaborative Framework,

S. Hong, M. Zhuge, J. Chen, X. Zheng, Y. Cheng, C. Zhang, and C. Wu, “MetaGPT: Meta Programming for a Multi-Agent Collaborative Framework,” Proceedings of the International Conference on Learning Representations (ICLR), 2024

2024

[6] [6]

ChatDev: Communicative Agents for Software Development,

C. Qian, X. Cong, C. Yang, W. Chen, Y. Su, J. Xu, and M. Sun, “ChatDev: Communicative Agents for Software Development,” Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (ACL) , pp. 15174–15186, 2024

2024

[7] [7]

Self -Collaboration Code Generation via ChatGPT,

Y. Dong, X. Jiang, Z. Jin, and G. Li, “Self -Collaboration Code Generation via ChatGPT,” ACM Transactions on Software Engineering and Methodology, vol. 32, no. 3, pp. 1– 38, 2023

2023

[8] [8]

SWE-agent: Agent-Computer Interfaces Enable Automated Software Engineering,

J. Yang, C. E. Jimenez, A. Wettig, K. Lieret, J. K. Mitchell, K. Narasimhan, and O. Press, “SWE-agent: Agent-Computer Interfaces Enable Automated Software Engineering,” Advances in Neural Information Processing Systems (NeurIPS), vol. 37, 2024

2024

[9] [9]

SWE -bench: Can Language Models Resolve Real-World GitHub Issues?

C. E. Jimenez, J. Yang, A. Wettig, H. Trivedi, K. Narasimhan, and O. Press, “SWE -bench: Can Language Models Resolve Real-World GitHub Issues?” Proceedings of the International Conference on Learning Representations (ICLR), 2024

2024

[10] [11]

Live-SWE-agent: Can software engineering agents self-evolve on the fly?

C. S. Xia, Z. Wang, Y. Yang, Y. Wei, and L. Zhang, “Live-SWE-agent: Can Software Engineering Agents Self - Evolve on the Fly?” arXiv preprint arXiv:2511.13646, 2025

work page arXiv 2025

[11] [12]

Voyager: An Open-Ended Embodied Agent with Large Language Models

G. Wang et al., “Voyager: An Open -Ended Embodied Agent with Open-Ended Skills in Minecraft,” arXiv preprint arXiv:2305.16291, 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023

[12] [13]

Chameleon: Plug-and-Play Composition of Mixture-of-Experts for Multi-step Reasoning,

P. Lu et al., “Chameleon: Plug-and-Play Composition of Mixture-of-Experts for Multi-step Reasoning,” arXiv preprint arXiv:2304.09842, 2023

work page arXiv 2023

[13] [14]

RepoCoder: Repository -Level Code Completion Through Iterative Retrieval and Generation,

F. Zhang, B. Chen, Y. Zhang, J. Keung, J. Liu, D. Zan, and W. Chen, “RepoCoder: Repository -Level Code Completion Through Iterative Retrieval and Generation,” Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP), 2023

2023

[14] [15]

Dataflow -Guided Retrieval Augmentation for Repository -Level Code Completion,

H. Cheng, Y. Wu, et al., “Dataflow -Guided Retrieval Augmentation for Repository -Level Code Completion, ” Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (ACL), 2024

2024

[15] [16]

In Line with Context: Repository-Level Code Generation via Context Inlining,

C. Hu, W. Zeng, Y. Shi, B. Shen, and X. Gu, “In Line with Context: Repository-Level Code Generation via Context Inlining,” Proceedings of the ACM We b Conference / FSE 2026, 2026

2026

[16] [17]

Di Wu, W. U. A., Zhang, D., Ramanathan, M. K., & Ma, X. Repoformer: Selective retrieval for repository -level code completion, 2024. URL https://arxiv. org/abs/2403.10059

work page arXiv 2024

[17] [18]

ReCUBE: Evaluating Repository -Level Context Utilization in Code Generation,

J. Hong, B. G. Ascoli, and J. D. Choi, “ReCUBE: Evaluating Repository -Level Context Utilization in Code Generation,” arXiv preprint arXiv:2603.25770, 2026

work page arXiv 2026

[18] [19]

(2025, April)

Wang, Y., Wang, Y., Guo, D., Chen, J., Zhang, R., Ma, Y., & Zheng, Z. (2025, April). Rlcoder: Reinforcement learning for repository -level code completi on. In 2025 IEEE/ACM 47th International Conference on Software Engineering (ICSE) (pp. 1140-1152). IEEE

2025

[19] [20]

Teaching Large Language Models to Self -Debug,

B. Chen, F. Zhang, et al., “Teaching Large Language Models to Self -Debug,” Proceedings of the International Conference on Learning Representations (ICLR), 2024

2024

[20] [21]

Self -Refine: Iterative Refinement with Self -Feedback,

A. Madaan, N. Tandon, P. Gupta, S. Hallinan, L. Gao, S. Wiegreffe, and P. Clark, “Self -Refine: Iterative Refinement with Self -Feedback,” Advances in Neural Information Processing Systems (NeurIPS), 2023

2023

[21] [22]

Adnan, M., Xu, Z., & Kuhn, C. C. (2025). Large language model guided self-debugging code generation. arXiv preprint arXiv:2502.02928

work page arXiv 2025

[22] [23]

Iterative Refinement of Project-Level Code Context for Precise Code Generation with Compiler Feedback,

Z. Bi, Y. Wan, Z. Wang, H. Zhang, B. Guan, F. Lu, and X. Shi, “Iterative Refinement of Project-Level Code Context for Precise Code Generation with Compiler Feedback,” Findings of the Association for Computational Linguistics: ACL 2024, pp. 2336–2353, 2024

2024

[23] [24]

Large Language Models as Tool Makers,

T. Cai, X. Wang, T. Ma, X. Chen, and D. Zhou, “Large Language Models as Tool Makers,” Advances in Neural Information Processing Systems (NeurIPS), 2023

2023

[24] [25]

Toolformer: Language Models Can Teach Themselves to Use Tools,

T. Schick, J. Dwivedi-Yu, R. Dessì, H. Caron, P. Singh, T. Scialom, and E. Grave, “Toolformer: Language Models Can Teach Themselves to Use Tools,” Transactions of the Association for Computational Linguistics, vol. 11, pp. 111– 125, 2023

2023

[25] [26]

Empirical Analysis of Code Smells in LLM-Generated Code,

M. Siddiq et al., “Empirical Analysis of Code Smells in LLM-Generated Code,” Journal of Software: Evolution and Process, 2025

2025

[26] [27]

SmellBench: Evaluating LLM Agents on Architectural Code Smell Repair

SmellBench, “SmellBench: Empirical Evaluation of LLM Agents on Architectural Code Smell Repair,” arXiv preprint arXiv:2605.07001, 2026

work page internal anchor Pith review Pith/arXiv arXiv 2026

[27] [28]

Liu, H., Fuchß, D., Corallo, S., Hummel, M., Keim, J., & Hey, T. (2026). Architecture in the Cradle: Early Warning of Architectural Decay with ArchGuard. In 23rd IEEE International Conference on Software Architecture (ICSA 2026)

2026