arxiv: 2604.02702 · v1 · submitted 2026-04-03 · 💻 cs.SE · cs.PL

Recognition: 1 theorem link

· Lean Theorem

TypePro: Boosting LLM-Based Type Inference via Inter-Procedural Slicing

Teyu Lin , Minghao Fan , Huaxun Huang , Zhirong Shen , Rongxin Wu

Authors on Pith no claims yet

Pith reviewed 2026-05-13 20:27 UTC · model grok-4.3

classification 💻 cs.SE cs.PL

keywords type inferencelarge language modelsinter-procedural slicingdynamic languagesPythonTypeScriptcode analysis

0 comments

The pith

Inter-procedural slicing supplies richer context that lifts LLM type inference accuracy by 7 to 10 points on dynamic-language benchmarks.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents TypePro, a technique that applies large language models to infer types in dynamic languages such as Python and TypeScript. Prior LLM approaches used only the direct dependencies of a variable, which often left the model without enough surrounding code to choose complex types correctly. TypePro instead performs inter-procedural slicing to collect broader context and then derives candidate complex types directly from the structural patterns visible in those slices. On the ManyTypes4Py and ManyTypes4TypeScript datasets this produces Top-1 exact-match rates of 88.9 percent and 86.6 percent, respectively, exceeding the previous best method by 7.1 and 10.3 percentage points. The gains occur without any domain-specific fine-tuning of the underlying language model.

Core claim

TypePro supplements the limited direct-dependency context used by prior LLM-based type inference with inter-procedural slices. It then extracts structural information about data types from those slices to propose a set of candidate complex types, allowing the LLM to select the correct type without domain-specific training.

What carries the argument

Inter-procedural code slicing that extracts structural type information from the slices to generate a focused set of candidate complex types for the LLM.

If this is right

LLM-based type inference can handle complex types in dynamic languages without requiring language-specific fine-tuning or heavy static analysis.
The reported 7-10 point gains on standard benchmarks would reduce the number of type-related runtime errors that reach production code.
Because the method relies only on slicing and structural cues, it can be applied to other dynamic languages once suitable slicing tools exist.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Similar context-augmentation steps could raise LLM performance on related software-engineering tasks such as defect prediction or API recommendation.
In very large repositories, the cost and precision of computing inter-procedural slices may become the practical bottleneck rather than the LLM call itself.
The approach could be tested on live open-source projects to check whether the accuracy lift persists when code contains more incomplete or noisy dependencies than the curated datasets.

Load-bearing premise

Inter-procedural slices contain relevant non-noisy context whose structural patterns are enough for an off-the-shelf LLM to select the right complex type.

What would settle it

Replace the inter-procedural slices with random or unrelated code fragments of similar length and measure whether the Top-1 exact-match rate falls back to the level of earlier LLM baselines.

Figures

Figures reproduced from arXiv: 2604.02702 by Huaxun Huang, Minghao Fan, Rongxin Wu, Teyu Lin, Zhirong Shen.

**Figure 1.** Figure 1: The motivating example where model serves as the target variable for data type inference. The JavaScript superset TypeScript [6] provides static type checking at compile time and is then compiled (transpiled) to plain JavaScript for execution at runtime [14]. These solutions likewise aim to mitigate problems introduced by dynamic typing. However, writing type annotations manually is both time-consuming and… view at source ↗

**Figure 2.** Figure 2: The slices of Figure [PITH_FULL_IMAGE:figures/full_fig_p005_2.png] view at source ↗

**Figure 3.** Figure 3: The Overview of TypePro. of user-defined data types and third-party library types included in the project under analysis. When processing a target file, for each variable requiring type inference, TypePro performs interprocedural code slicing to extract code snippets relevant to the variable at runtime as context. Based on the above two inputs, TypePro constructs a prompt containing (1) the code snippets … view at source ↗

**Figure 4.** Figure 4: The prompt generated for the example in Figure [PITH_FULL_IMAGE:figures/full_fig_p010_4.png] view at source ↗

**Figure 5.** Figure 5: Correct answer distribution between TypePro and baselines [PITH_FULL_IMAGE:figures/full_fig_p015_5.png] view at source ↗

read the original abstract

Dynamic languages (such as Python and JavaScript) offer flexibility and simplified type handling for programming, but this can also lead to an increase in type-related errors and additional overhead for compile-time type inference. As a result, type inference for dynamic languages has become a popular research area. Existing approaches typically achieve type inference through static analysis, machine learning, or large language models (LLMs). However, current work only focuses on the direct dependencies of variables related to type inference as the context, resulting in incomplete contextual information and thus affecting the accuracy of type inference. To address this issue, this paper proposes a method called TypePro, which leverages LLMs for type inference in dynamic languages. TypePro supplements contextual information by conducting inter-procedural code slicing. Then, TypePro proposes a set of candidate complex types based on the structural information of data types implied in the slices, thereby addressing the lack of domain knowledge of LLMs. We conducted experiments on the ManyTypes4Py and ManyTypes4TypeScript datasets, achieving Top-1 exact match (EM) rates of 88.9% and 86.6%, respectively. Notably, TypePro improves the Top-1 Exact Match by 7.1 and 10.3 percentage points over the second-best approach, showing the effectiveness and robustness of TypePro.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

TypePro shows that inter-procedural slicing plus slice-derived candidate types lifts LLM type inference accuracy by 7-10 points on two public datasets, and the gains look real enough to matter.

read the letter

The main takeaway is that TypePro gets measurable lifts in top-1 exact match on ManyTypes4Py and ManyTypes4TypeScript by pulling in inter-procedural slices for context and then generating candidate complex types from the slice structure. That combination is the concrete novelty here; prior LLM work on type inference mostly stayed inside single-procedure or direct-dependency windows, so the extra reach plus the structural hinting is a practical step forward. The paper reports the numbers cleanly on named public datasets and shows the method beating the previous best by 7.1 and 10.3 points, which is the kind of result that would interest people shipping type checkers for Python and TypeScript. The evaluation protocol, baselines, and metrics line up consistently between abstract and full text, with no obvious internal contradictions in how the slices are built or how the candidates are proposed. One soft spot is that the write-up could still use tighter ablations to isolate how much of the gain comes from the slicing itself versus the candidate-generation step, and it would help to see error analysis on the remaining misses. Overall the argument holds together without load-bearing assumptions that look shaky on the data presented. This is the sort of paper that belongs in a reading group for people working on LLM-assisted static analysis; the results are concrete enough that a serious referee should see it rather than desk-reject it.

Referee Report

2 major / 2 minor

Summary. The paper proposes TypePro, a technique that augments LLM-based type inference for dynamic languages (Python, TypeScript) by performing inter-procedural code slicing to enrich context and by deriving candidate complex types from the structural information present in those slices. Experiments on the ManyTypes4Py and ManyTypes4TypeScript datasets report Top-1 exact-match accuracies of 88.9% and 86.6%, respectively, corresponding to gains of 7.1 and 10.3 percentage points over the second-best baseline.

Significance. If the reported gains are reproducible, the work demonstrates a practical way to inject domain-specific structural knowledge into LLMs without fine-tuning, addressing a known limitation of pure LLM type inference. The use of publicly available datasets and explicit comparison to prior methods supports potential impact on developer tooling for dynamic languages.

major comments (2)

[Section 4] Section 4 (Method): the claim that candidate-type generation from slice structure supplies the missing domain knowledge is central, yet the precise rules or heuristics that map slice structure to candidate types are described only at a high level; without an explicit algorithm or worked example it is difficult to verify that the mechanism is deterministic and non-circular with the LLM prompt.
[Section 5] Section 5 (Experiments): while the Top-1 EM improvements are stated, the manuscript does not report variance across random seeds, statistical significance tests, or an ablation that isolates inter-procedural slicing from intra-procedural context; these omissions make it hard to judge whether the 7.1–10.3 pp gains are robust or attributable to the proposed slicing step.

minor comments (2)

[Introduction] The abstract and introduction use the term 'inter-procedural slicing' without an early reference to the exact slicing algorithm (e.g., Weiser-style or PDG-based); a citation or brief definition in §2 would improve readability.
[Figure 2] Figure 2 (overview diagram) would benefit from explicit annotation of the slice extraction step and the candidate-generation step to match the textual description.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments. We address each major comment below and will incorporate the suggested improvements in the revised manuscript.

read point-by-point responses

Referee: [Section 4] Section 4 (Method): the claim that candidate-type generation from slice structure supplies the missing domain knowledge is central, yet the precise rules or heuristics that map slice structure to candidate types are described only at a high level; without an explicit algorithm or worked example it is difficult to verify that the mechanism is deterministic and non-circular with the LLM prompt.

Authors: We agree that the description in Section 4 is high-level. In the revised manuscript we will add an explicit algorithm (pseudocode) that details the deterministic heuristics for deriving candidate types from slice structure, together with a worked example. The generation step relies solely on static structural patterns extracted from the inter-procedural slice (e.g., variable assignments, call sites, and type-related literals) and is performed before any LLM prompting, ensuring it is independent and non-circular. revision: yes
Referee: [Section 5] Section 5 (Experiments): while the Top-1 EM improvements are stated, the manuscript does not report variance across random seeds, statistical significance tests, or an ablation that isolates inter-procedural slicing from intra-procedural context; these omissions make it hard to judge whether the 7.1–10.3 pp gains are robust or attributable to the proposed slicing step.

Authors: We acknowledge that these additional analyses would strengthen the evaluation. In the revised manuscript we will report mean and standard deviation across multiple random seeds, include statistical significance tests (paired t-tests) against baselines, and add an ablation comparing the full inter-procedural approach against an intra-procedural-only variant. This will isolate the contribution of inter-procedural slicing to the reported gains. revision: yes

Circularity Check

0 steps flagged

No significant circularity in derivation chain

full rationale

The paper presents an empirical method (TypePro) that augments LLM type inference via inter-procedural slicing for added context and candidate-type generation from slice structure. All reported gains (7.1 pp and 10.3 pp Top-1 EM on ManyTypes4Py and ManyTypes4TypeScript) are framed as direct experimental outcomes on named public datasets. No equations, fitted parameters, predictions that reduce to inputs, or load-bearing self-citations appear in the provided text. The evaluation protocol is externally falsifiable and does not rely on internal re-derivation of its own results.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

The abstract mentions no free parameters, axioms, or new invented entities; the method is presented as an engineering combination of existing slicing and LLM capabilities.

pith-pipeline@v0.9.0 · 5548 in / 976 out tokens · 63164 ms · 2026-05-13T20:27:23.824650+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

37 extracted references · 37 canonical work pages · 3 internal anchors

[1]

https://chat.chatbot.app/gpt4

2025.ChatGPT. https://chat.chatbot.app/gpt4

work page 2025
[2]

https://www.anthropic.com/claude/sonnet

2025.Claude Sonnet 4.6. https://www.anthropic.com/claude/sonnet

work page 2025
[3]

https://github.com/python/mypy/

2025.Mypy: Static Typing for Python. https://github.com/python/mypy/

work page 2025
[4]

https://chat.qwen.ai/ , Vol

2025.Qwen Chat. https://chat.qwen.ai/ , Vol. 1, No. 1, Article . Publication date: April 2026. TypePro: Boosting LLM-Based Type Inference via Inter-Procedural Slicing 21

work page 2025
[5]

Barr, Soline Ducousso, and Zheng Gao

Miltiadis Allamanis, Earl T. Barr, Soline Ducousso, and Zheng Gao. 2020. Typilus: neural type hints. InProceedings of the 41st ACM SIGPLAN Conference on Programming Language Design and Implementation(London, UK)(PLDI 2020). Association for Computing Machinery, New York, NY, USA, 91–105. doi:10.1145/3385412.3385997

work page doi:10.1145/3385412.3385997 2020
[6]

Gavin Bierman, Martín Abadi, and Mads Torgersen. 2014. Understanding typescript. InEuropean Conference on Object-Oriented Programming. Springer, 257–281

work page 2014
[7]

Hanting Chen, Yasheng Wang, Kai Han, Dong Li, Lin Li, Zhenni Bi, Jinpeng Li, Haoyu Wang, Fei Mi, Mingjian Zhu, et al. 2025. Pangu embedded: An efficient dual-system llm reasoner with metacognition.arXiv preprint arXiv:2505.22375 (2025)

work page arXiv 2025
[8]

Michael Emmi and Constantin Enea. 2016. Symbolic abstract data type inference. InProceedings of the 43rd Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages(St. Petersburg, FL, USA)(POPL ’16). Association for Computing Machinery, New York, NY, USA, 513–525. doi:10.1145/2837614.2837645

work page doi:10.1145/2837614.2837645 2016
[9]

Daniel Fried, Armen Aghajanyan, Jessy Lin, Sida Wang, Eric Wallace, Freda Shi, Ruiqi Zhong, Wen tau Yih, Luke Zettle- moyer, and Mike Lewis. 2023. InCoder: A Generative Model for Code Infilling and Synthesis. arXiv:2204.05999 [cs.SE] https://arxiv.org/abs/2204.05999

work page internal anchor Pith review arXiv 2023
[10]

2023.pytype - A Static Type Analyzer for Python Code

Google. 2023.pytype - A Static Type Analyzer for Python Code. https://github.com/google/pytype

work page 2023
[11]

Salvatore Guarnieri and V Benjamin Livshits. 2009. GATEKEEPER: Mostly Static Enforcement of Security and Reliability Policies for JavaScript Code.. InUSENIX Security Symposium, Vol. 10. 78–85

work page 2009
[12]

Daya Guo, Shuai Lu, Nan Duan, Yanlin Wang, Ming Zhou, and Jian Yin. 2022. UniXcoder: Unified Cross-Modal Pre-training for Code Representation. arXiv:2203.03850 [cs.CL] https://arxiv.org/abs/2203.03850

work page arXiv 2022
[13]

Yimeng Guo, Zhifei Chen, Lin Chen, Wenjie Xu, Yanhui Li, Yuming Zhou, and Baowen Xu. 2024. Generating Python Type Annotations from Type Inference: How Far Are We?ACM Trans. Softw. Eng. Methodol.33, 5, Article 123 (June 2024), 38 pages. doi:10.1145/3652153

work page doi:10.1145/3652153 2024
[14]

Vincent J Hellendoorn, Christian Bird, Earl T Barr, and Miltiadis Allamanis. 2018. Deep learning type inference. In Proceedings of the 2018 26th acm joint meeting on european software engineering conference and symposium on the foundations of software engineering. 152–162

work page 2018
[15]

Susan Horwitz, Thomas Reps, and David Binkley. 1990. Interprocedural slicing using dependence graphs.ACM Transactions on Programming Languages and Systems (TOPLAS)12, 1 (1990), 26–60

work page 1990
[16]

2025.Hugging Face Hub: A platform for sharing machine learning models, datasets and demos

Hugging Face Inc. 2025.Hugging Face Hub: A platform for sharing machine learning models, datasets and demos. https://huggingface.co/

work page 2025
[17]

Kevin Jesse and Premkumar T. Devanbu. 2022. ManyTypes4TypeScript: a comprehensive TypeScript dataset for sequence-based type inference. InProceedings of the 19th International Conference on Mining Software Repositories (Pittsburgh, Pennsylvania)(MSR ’22). Association for Computing Machinery, New York, NY, USA, 294–298. doi:10. 1145/3524842.3528507

work page arXiv 2022
[18]

Manning, Hinrich Schütze, and Prabhakar Raghavan

Christopher D. Manning, Hinrich Schütze, and Prabhakar Raghavan. 2008.Introduction to Information Retrieval. Cambridge University Press, Cambridge, UK

work page 2008
[19]

2023.Pyright - Static Type Checker for Python

Microsoft. 2023.Pyright - Static Type Checker for Python. https://github.com/microsoft/pyright

work page 2023
[20]

Amir M Mir, Evaldas Latoškinas, and Georgios Gousios. 2021. Manytypes4py: A benchmark python dataset for machine learning-based type inference. In2021 IEEE/ACM 18th International Conference on Mining Software Repositories (MSR). IEEE, 585–589

work page 2021
[21]

Amir M Mir, Evaldas Latoškinas, Sebastian Proksch, and Georgios Gousios. 2022. Type4py: Practical deep similarity learning-based type inference for python. InProceedings of the 44th International Conference on Software Engineering. 2241–2252

work page 2022
[22]

Chris Parnin and Alessandro Orso. 2011. Are automated debugging techniques actually helping programmers?. In Proceedings of the 2011 international symposium on software testing and analysis. 199–209

work page 2011
[23]

Zvonimir Pavlinovic, Yusen Su, and Thomas Wies. 2021. Data flow refinement type inference.Proc. ACM Program. Lang.5, POPL, Article 19 (Jan. 2021), 31 pages. doi:10.1145/3434300

work page doi:10.1145/3434300 2021
[24]

Yun Peng, Cuiyun Gao, Zongjie Li, Bowei Gao, David Lo, Qirun Zhang, and Michael Lyu. 2022. Static inference meets deep learning: a hybrid type inference approach for python. InProceedings of the 44th International Conference on Software Engineering. 2019–2030

work page 2022
[25]

Yun Peng, Chaozheng Wang, Wenxuan Wang, Cuiyun Gao, and Michael R Lyu. 2023. Generative type inference for python. In2023 38th IEEE/ACM International Conference on Automated Software Engineering (ASE). IEEE, 988–999

work page 2023
[26]

Stephen Robertson and Hugo Zaragoza. 2009. The Probabilistic Relevance Framework: BM25 and Beyond.Foundations and Trends®in Information Retrieval3, 4 (2009), 333–389. doi:10.1561/1500000019

work page doi:10.1561/1500000019 2009
[27]

Vitalis Salis, Thodoris Sotiropoulos, Panos Louridas, Diomidis Spinellis, and Dimitris Mitropoulos. 2021. Pycg: Practical call graph generation in python. In2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE). IEEE, 1646–1657. , Vol. 1, No. 1, Article . Publication date: April 2026. 22 Teyu Lin, Minghao Fan, Huaxun Huang, Zhirong Sh...

work page 2021
[28]

Cristian-Alexandru Staicu, Michael Pradel, and Ben Livshits. 2018. Understanding and automatically preventing injection attacks on Node. js. InNetwork and Distributed System Security Symposium (NDSS)

work page 2018
[29]

Hugo Touvron, Thibaut Lavril, Gautier Izacard, Xavier Martinet, Marie-Anne Lachaux, Timothée Lacroix, Baptiste Rozière, Naman Goyal, Eric Hambro, Faisal Azhar, et al. 2023. Llama: Open and efficient foundation language models. arXiv 2023.arXiv preprint arXiv:2302.1397110 (2023)

work page internal anchor Pith review Pith/arXiv arXiv 2023
[30]

Guido van Rossum, Jukka Lehtosalo, and Łukasz Langa. 2014. PEP 484 – Type Hints. [Online]. Available: https: //peps.python.org/pep-0484/. Accessed: 2025-08-27

work page 2014
[31]

Chong Wang, Jian Zhang, Yiling Lou, Mingwei Liu, Weisong Sun, Yang Liu, and Xin Peng. 2024. Tiger: A generating- then-ranking framework for practical python type inference.arXiv preprint arXiv:2407.02095(2024)

work page arXiv 2024
[32]

Yue Wang, Hung Le, Akhilesh Deepak Gotmare, Nghi DQ Bui, Junnan Li, and Steven CH Hoi. 2023. Codet5+: Open code large language models for code understanding and generation.arXiv preprint arXiv:2305.07922(2023)

work page arXiv 2023
[33]

Yue Wang, Weishi Wang, Shafiq Joty, and Steven CH Hoi. 2021. Codet5: Identifier-aware unified pre-trained encoder- decoder models for code understanding and generation.arXiv preprint arXiv:2109.00859(2021)

work page internal anchor Pith review arXiv 2021
[34]

Jiayi Wei, Maruth Goyal, Greg Durrett, and Isil Dillig. 2020. LambdaNet: Probabilistic Type Inference using Graph Neural Networks. arXiv:2005.02161 [cs.PL] https://arxiv.org/abs/2005.02161

work page arXiv 2020
[35]

Zhaogui Xu, Xiangyu Zhang, Lin Chen, Kexin Pei, and Baowen Xu. 2016. Python probabilistic type inference with natural language support. InProceedings of the 2016 24th ACM SIGSOFT international symposium on foundations of software engineering. 607–618

work page 2016
[36]

Yanyan Yan, Yang Feng, Hongcheng Fan, and Baowen Xu. 2023. Dlinfer: Deep learning with static slicing for python type inference. In2023 IEEE/ACM 45th International Conference on Software Engineering (ICSE). IEEE, 2009–2021

work page 2023
[37]

Ming-Ho Yee and Arjun Guha. 2023. Do Machine Learning Models Produce TypeScript Types That Type Check? Schloss Dagstuhl – Leibniz-Zentrum für Informatik. doi:10.4230/LIPICS.ECOOP.2023.37 , Vol. 1, No. 1, Article . Publication date: April 2026

work page doi:10.4230/lipics.ecoop.2023.37 2023