Recognition: unknown
TypeScript Repository Indexing for Code Agent Retrieval
Pith reviewed 2026-05-10 04:05 UTC · model grok-4.3
The pith
A parser using the TypeScript Compiler API directly builds reliable UniAST indexes for large repositories much faster than LSP-based methods.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The central claim is that abcoder-ts-parser, built directly on the TypeScript Compiler API, produces reliable UniAST indexes for TypeScript repositories up to 1.2 million lines of code significantly more efficiently than the existing architecture that augments AST parsers with language-server protocol calls.
What carries the argument
abcoder-ts-parser, which traverses the TypeScript compiler's native AST together with its semantic information and module-resolution logic to construct the function-level UniAST index without per-symbol RPC calls.
If this is right
- Graph-based retrieval that keeps call chains and dependency links becomes feasible for TypeScript codebases that previously timed out during indexing.
- LLM code agents can obtain richer context from larger repositories without incurring the latency of repeated language-server lookups.
- The UniAST index can be refreshed more frequently during development because each rebuild finishes in less time.
- The same direct-compiler pattern removes a scaling obstacle that affects any system trying to maintain semantic code graphs at repository scale.
Where Pith is reading between the lines
- Similar direct-compiler parsers could be written for other languages whose compilers expose comparable semantic APIs, potentially improving indexing speed across more of the code-agent ecosystem.
- Faster indexing may allow agents to maintain live indexes that update as the developer edits code rather than requiring full rebuilds between sessions.
- If the indexes prove complete, downstream tasks such as impact analysis or automated refactoring could also benefit from the same lightweight graph structure.
Load-bearing premise
That the TypeScript Compiler API supplies all needed semantic relationships and call chains for the UniAST index without the completeness gaps that originally led to adding language-server calls.
What would settle it
A side-by-side run of the new parser and the prior LSP-based parser on one of the 1.2-million-line projects that records both wall-clock indexing time and a manual or automated check of whether the extracted call chains and dependencies match.
read the original abstract
Graph-based code indexing can improve context retrieval for LLM-based code agents by preserving call chains and dependency relationships that keyword search and similarity retrieval often miss. ABCoder is an open-source framework that parses codebases into a function-level code index called UniAST. Its existing parsers combine lightweight AST parsers for syntactic analysis with language servers for semantic resolution, but because LSP-based resolution requires a JSON-RPC call for each symbol lookup, these per-symbol calls become a bottleneck on large TypeScript repositories. We present abcoder-ts-parser, a TypeScript parser built on the TypeScript Compiler API that works directly with the compiler's AST, semantic information, and module resolution logic. We evaluate the parser on three open-source TypeScript projects with up to 1.2 million lines of code and find that it produces reliable indexes significantly more efficiently than the existing architecture. For a live demonstration, watch: https://youtu.be/ryssr7ouvdE
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper presents abcoder-ts-parser, a TypeScript parser built on the TypeScript Compiler API (including its AST, semantic checker, and module resolver) to generate UniAST indexes for the ABCoder framework. It argues that this replaces the prior combination of lightweight AST parsers plus per-symbol LSP calls, which created bottlenecks on large repositories, and claims the new parser produces reliable indexes significantly more efficiently, as shown by evaluation on three open-source TypeScript projects with up to 1.2 million lines of code.
Significance. If the efficiency gains and index reliability hold, the work would provide a practical, scalable improvement to graph-based code indexing for LLM-based code agents, directly addressing a performance limitation in the existing ABCoder architecture for TypeScript codebases.
major comments (3)
- [Abstract / Evaluation] Abstract and Evaluation section: The central claim that the parser 'produces reliable indexes significantly more efficiently' is unsupported by any quantitative metrics, baselines, runtime numbers, accuracy measures, or error analysis; the abstract supplies only the project sizes and a qualitative assertion.
- [Motivation / §3] Motivation and §3 (Parser Design): The paper's own motivation notes that lightweight AST parsers had completeness gaps in semantic relationships and call chains, motivating LSP use; however, no verification (e.g., cross-file reference counts, call-graph edge comparison, or tests on tsconfig paths/declaration merging) is provided to confirm the Compiler API version achieves equivalent resolution.
- [Evaluation] Evaluation section: The claim of evaluation 'on three open-source TypeScript projects' lacks any description of the methodology, selected projects' characteristics (beyond LOC), or how 'reliability' was assessed relative to the prior LSP-augmented parsers.
minor comments (2)
- [Abstract] The live demo video link is helpful but the manuscript should include at least one self-contained code example or index snippet to illustrate the UniAST output.
- [§2 / §3] Notation for UniAST and the index structure could be clarified with a small diagram or table of node/edge types.
Simulated Author's Rebuttal
We thank the referee for the constructive and detailed comments, which highlight important areas where the manuscript can be strengthened. We address each major comment below and will incorporate the suggested improvements in the revised version.
read point-by-point responses
-
Referee: [Abstract / Evaluation] Abstract and Evaluation section: The central claim that the parser 'produces reliable indexes significantly more efficiently' is unsupported by any quantitative metrics, baselines, runtime numbers, accuracy measures, or error analysis; the abstract supplies only the project sizes and a qualitative assertion.
Authors: We agree that the abstract and Evaluation section currently rely on a qualitative assertion without supporting quantitative evidence. In the revised manuscript we will expand the abstract to reference key efficiency metrics and add to the Evaluation section concrete runtime comparisons against the prior LSP-augmented parser, indexing throughput numbers, accuracy measures, and error analysis drawn from the experiments on the three projects. revision: yes
-
Referee: [Motivation / §3] Motivation and §3 (Parser Design): The paper's own motivation notes that lightweight AST parsers had completeness gaps in semantic relationships and call chains, motivating LSP use; however, no verification (e.g., cross-file reference counts, call-graph edge comparison, or tests on tsconfig paths/declaration merging) is provided to confirm the Compiler API version achieves equivalent resolution.
Authors: The motivation correctly identifies the semantic gaps that prompted LSP usage. Our design replaces per-symbol LSP calls with the Compiler API's native semantic checker and module resolver. We acknowledge that explicit verification would strengthen the equivalence claim; we will add a dedicated subsection (or expand §3) containing cross-file reference counts, call-graph edge comparisons, and targeted tests for tsconfig path resolution and declaration merging. revision: yes
-
Referee: [Evaluation] Evaluation section: The claim of evaluation 'on three open-source TypeScript projects' lacks any description of the methodology, selected projects' characteristics (beyond LOC), or how 'reliability' was assessed relative to the prior LSP-augmented parsers.
Authors: We will substantially revise the Evaluation section to describe the evaluation methodology in detail, provide additional characteristics of the three projects (domain, architectural features, and TypeScript-specific constructs exercised), and explain how reliability was assessed, including direct side-by-side comparisons of reference resolution and call-chain completeness against the prior LSP-augmented parsers. revision: yes
Circularity Check
No circularity: direct implementation and empirical comparison with no derivations or self-referential reductions
full rationale
The paper presents an engineering implementation of abcoder-ts-parser using the TypeScript Compiler API, replacing LSP-based resolution in the existing ABCoder framework, followed by runtime and scalability evaluation on three TypeScript projects. No equations, parameter fitting, uniqueness theorems, or ansatzes are present. Claims of 'reliable indexes' and efficiency gains rest on direct benchmarking against the prior architecture rather than any self-definitional loop or fitted-input prediction. The work is self-contained as a systems contribution without load-bearing self-citations that reduce the central result to prior unverified assertions by the same authors.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption The TypeScript Compiler API provides equivalent or superior semantic and dependency information to LSP for function-level indexing without requiring per-symbol RPC calls.
Reference graph
Works this paper leans on
-
[1]
Anthropic. 2024. Introducing the Model Context Protocol. https://www.anthropic.com/news/model-context-protocol Published 2024-11-25
2024
-
[2]
Anthropic. 2026. Claude Code Overview. https://docs.anthropic.com/en/docs/claude-code/overview Official docu- mentation, accessed 2026-04-14
2026
-
[3]
CloudWeGo Team. 2026. ABCoder: AI-Based Coder (AKA: A Brand-new Coder). https://github.com/cloudwego/ abcoder GitHub repository, accessed 2026-04-14
2026
-
[4]
CloudWeGo Team. 2026. UniAST Specification. https://github.com/cloudwego/abcoder/blob/main/docs/uniast-zh.md GitHub documentation, accessed 2026-04-14
2026
-
[5]
Virtual whiteboard for sketching hand-drawn like diagrams
Excalidraw. 2026. excalidraw/excalidraw. https://github.com/excalidraw/excalidraw GitHub repository, “Virtual whiteboard for sketching hand-drawn like diagrams”, accessed 2026-04-14
2026
-
[6]
Yichen Li, Jinyang Liu, Junsong Pu, Zhihan Jiang, Zhuangbin Chen, Xiao He, Tieying Zhang, Jianjun Chen, Yi Li, Rui Shi, and Michael R. Lyu. 2025. Automated Proactive Logging Quality Improvement for Large-Scale Codebases. In 2025 40th IEEE/ACM International Conference on Automated Software Engineering (ASE). IEEE, 3426–3437. doi:10.1109/ ASE63991.2025.00283
-
[7]
Microsoft. 2026. Language Server Protocol. https://microsoft.github.io/language-server-protocol/ Official documenta- tion, version 3.17, accessed 2026-04-14
2026
-
[8]
The fastest knowledge base for growing teams
Outline. 2026. outline/outline. https://github.com/outline/outline GitHub repository, “The fastest knowledge base for growing teams”, accessed 2026-04-14
2026
- [9]
-
[10]
Junsong Pu, Yichen Li, Zhuangbin Chen, Jinyang Liu, Zhihan Jiang, Jianjun Chen, Rui Shi, Zibin Zheng, and Tieying Zhang. 2025. ErrorPrism: Reconstructing Error Propagation Paths in Cloud Service Systems. In2025 40th IEEE/ACM International Conference on Automated Software Engineering (ASE). IEEE, 3534–3545. doi:10.1109/ASE63991.2025.00292
-
[11]
Developer-first error tracking and performance monitoring
Sentry. 2026. getsentry/sentry. https://github.com/getsentry/sentry GitHub repository, “Developer-first error tracking and performance monitoring”, accessed 2026-04-14
2026
-
[12]
Tree-sitter Contributors. 2026. Tree-sitter. https://tree-sitter.github.io/tree-sitter/ Official documentation, accessed 2026-04-14
2026
-
[13]
Xu, Yiqing Xie, Graham Neubig, and Daniel Fried
Zora Zhiruo Wang, Akari Asai, Xinyan Velocity Yu, Frank F. Xu, Yiqing Xie, Graham Neubig, and Daniel Fried
-
[14]
InFindings of the Association for Computational Linguistics: NAACL 2025
CodeRAG-Bench: Can Retrieval Augment Code Generation?. InFindings of the Association for Computational Linguistics: NAACL 2025. Association for Computational Linguistics, Albuquerque, New Mexico, 3199–3214. doi:10. 18653/v1/2025.findings-naacl.176
2025
-
[15]
Di Wu, Wasi Uddin Ahmad, Dejiao Zhang, Murali Krishna Ramanathan, and Xiaofei Ma. 2024. Repoformer: Selective Retrieval for Repository-Level Code Completion. InProceedings of the 41st International Conference on Machine Learning (Proceedings of Machine Learning Research, Vol. 235). PMLR, Vienna, Austria, 53270–53290. https://proceedings.mlr. press/v235/wu24a.html
2024
-
[16]
Chunqiu Steven Xia, Yinlin Deng, Soren Dunn, and Lingming Zhang. 2024. Agentless: Demystifying LLM-based Software Engineering Agents. doi:10.48550/arXiv.2407.01489
work page internal anchor Pith review doi:10.48550/arxiv.2407.01489 2024
-
[17]
Jimenez, Alexander Wettig, Kilian Lieret, Shunyu Yao, Karthik R
John Yang, Carlos E. Jimenez, Alexander Wettig, Kilian Lieret, Shunyu Yao, Karthik R. Narasimhan, and Ofir Press
-
[18]
https://openreview.net/ forum?id=mXpq6ut8J3 NeurIPS 2024
SWE-agent: Agent-Computer Interfaces Enable Automated Software Engineering. https://openreview.net/ forum?id=mXpq6ut8J3 NeurIPS 2024
2024
-
[19]
Shunyu Yao, Jeffrey Zhao, Dian Yu, Nan Du, Izhak Shafran, Karthik Narasimhan, and Yuan Cao. 2022. ReAct: Synergizing Reasoning and Acting in Language Models. doi:10.48550/arXiv.2210.03629 arXiv:2210.03629; also presented at ICLR 2023
work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2210.03629 2022
-
[20]
Fengji Zhang, Bei Chen, Yue Zhang, Jacky Keung, Jin Liu, Daoguang Zan, Yi Mao, Jian-Guang Lou, and Weizhu Chen
-
[21]
RepoCoder : Repository-Level Code Completion Through Iterative Retrieval and Generation
RepoCoder: Repository-Level Code Completion Through Iterative Retrieval and Generation. InProceedings of the 2023 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, Singapore, 2471–2484. doi:10.18653/v1/2023.emnlp-main.151
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.