Recognition: 1 theorem link
· Lean TheoremTypePro: Boosting LLM-Based Type Inference via Inter-Procedural Slicing
Pith reviewed 2026-05-13 20:27 UTC · model grok-4.3
The pith
Inter-procedural slicing supplies richer context that lifts LLM type inference accuracy by 7 to 10 points on dynamic-language benchmarks.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
TypePro supplements the limited direct-dependency context used by prior LLM-based type inference with inter-procedural slices. It then extracts structural information about data types from those slices to propose a set of candidate complex types, allowing the LLM to select the correct type without domain-specific training.
What carries the argument
Inter-procedural code slicing that extracts structural type information from the slices to generate a focused set of candidate complex types for the LLM.
If this is right
- LLM-based type inference can handle complex types in dynamic languages without requiring language-specific fine-tuning or heavy static analysis.
- The reported 7-10 point gains on standard benchmarks would reduce the number of type-related runtime errors that reach production code.
- Because the method relies only on slicing and structural cues, it can be applied to other dynamic languages once suitable slicing tools exist.
Where Pith is reading between the lines
- Similar context-augmentation steps could raise LLM performance on related software-engineering tasks such as defect prediction or API recommendation.
- In very large repositories, the cost and precision of computing inter-procedural slices may become the practical bottleneck rather than the LLM call itself.
- The approach could be tested on live open-source projects to check whether the accuracy lift persists when code contains more incomplete or noisy dependencies than the curated datasets.
Load-bearing premise
Inter-procedural slices contain relevant non-noisy context whose structural patterns are enough for an off-the-shelf LLM to select the right complex type.
What would settle it
Replace the inter-procedural slices with random or unrelated code fragments of similar length and measure whether the Top-1 exact-match rate falls back to the level of earlier LLM baselines.
Figures
read the original abstract
Dynamic languages (such as Python and JavaScript) offer flexibility and simplified type handling for programming, but this can also lead to an increase in type-related errors and additional overhead for compile-time type inference. As a result, type inference for dynamic languages has become a popular research area. Existing approaches typically achieve type inference through static analysis, machine learning, or large language models (LLMs). However, current work only focuses on the direct dependencies of variables related to type inference as the context, resulting in incomplete contextual information and thus affecting the accuracy of type inference. To address this issue, this paper proposes a method called TypePro, which leverages LLMs for type inference in dynamic languages. TypePro supplements contextual information by conducting inter-procedural code slicing. Then, TypePro proposes a set of candidate complex types based on the structural information of data types implied in the slices, thereby addressing the lack of domain knowledge of LLMs. We conducted experiments on the ManyTypes4Py and ManyTypes4TypeScript datasets, achieving Top-1 exact match (EM) rates of 88.9% and 86.6%, respectively. Notably, TypePro improves the Top-1 Exact Match by 7.1 and 10.3 percentage points over the second-best approach, showing the effectiveness and robustness of TypePro.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper proposes TypePro, a technique that augments LLM-based type inference for dynamic languages (Python, TypeScript) by performing inter-procedural code slicing to enrich context and by deriving candidate complex types from the structural information present in those slices. Experiments on the ManyTypes4Py and ManyTypes4TypeScript datasets report Top-1 exact-match accuracies of 88.9% and 86.6%, respectively, corresponding to gains of 7.1 and 10.3 percentage points over the second-best baseline.
Significance. If the reported gains are reproducible, the work demonstrates a practical way to inject domain-specific structural knowledge into LLMs without fine-tuning, addressing a known limitation of pure LLM type inference. The use of publicly available datasets and explicit comparison to prior methods supports potential impact on developer tooling for dynamic languages.
major comments (2)
- [Section 4] Section 4 (Method): the claim that candidate-type generation from slice structure supplies the missing domain knowledge is central, yet the precise rules or heuristics that map slice structure to candidate types are described only at a high level; without an explicit algorithm or worked example it is difficult to verify that the mechanism is deterministic and non-circular with the LLM prompt.
- [Section 5] Section 5 (Experiments): while the Top-1 EM improvements are stated, the manuscript does not report variance across random seeds, statistical significance tests, or an ablation that isolates inter-procedural slicing from intra-procedural context; these omissions make it hard to judge whether the 7.1–10.3 pp gains are robust or attributable to the proposed slicing step.
minor comments (2)
- [Introduction] The abstract and introduction use the term 'inter-procedural slicing' without an early reference to the exact slicing algorithm (e.g., Weiser-style or PDG-based); a citation or brief definition in §2 would improve readability.
- [Figure 2] Figure 2 (overview diagram) would benefit from explicit annotation of the slice extraction step and the candidate-generation step to match the textual description.
Simulated Author's Rebuttal
We thank the referee for the constructive comments. We address each major comment below and will incorporate the suggested improvements in the revised manuscript.
read point-by-point responses
-
Referee: [Section 4] Section 4 (Method): the claim that candidate-type generation from slice structure supplies the missing domain knowledge is central, yet the precise rules or heuristics that map slice structure to candidate types are described only at a high level; without an explicit algorithm or worked example it is difficult to verify that the mechanism is deterministic and non-circular with the LLM prompt.
Authors: We agree that the description in Section 4 is high-level. In the revised manuscript we will add an explicit algorithm (pseudocode) that details the deterministic heuristics for deriving candidate types from slice structure, together with a worked example. The generation step relies solely on static structural patterns extracted from the inter-procedural slice (e.g., variable assignments, call sites, and type-related literals) and is performed before any LLM prompting, ensuring it is independent and non-circular. revision: yes
-
Referee: [Section 5] Section 5 (Experiments): while the Top-1 EM improvements are stated, the manuscript does not report variance across random seeds, statistical significance tests, or an ablation that isolates inter-procedural slicing from intra-procedural context; these omissions make it hard to judge whether the 7.1–10.3 pp gains are robust or attributable to the proposed slicing step.
Authors: We acknowledge that these additional analyses would strengthen the evaluation. In the revised manuscript we will report mean and standard deviation across multiple random seeds, include statistical significance tests (paired t-tests) against baselines, and add an ablation comparing the full inter-procedural approach against an intra-procedural-only variant. This will isolate the contribution of inter-procedural slicing to the reported gains. revision: yes
Circularity Check
No significant circularity in derivation chain
full rationale
The paper presents an empirical method (TypePro) that augments LLM type inference via inter-procedural slicing for added context and candidate-type generation from slice structure. All reported gains (7.1 pp and 10.3 pp Top-1 EM on ManyTypes4Py and ManyTypes4TypeScript) are framed as direct experimental outcomes on named public datasets. No equations, fitted parameters, predictions that reduce to inputs, or load-bearing self-citations appear in the provided text. The evaluation protocol is externally falsifiable and does not rely on internal re-derivation of its own results.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
- [1]
-
[2]
https://www.anthropic.com/claude/sonnet
2025.Claude Sonnet 4.6. https://www.anthropic.com/claude/sonnet
work page 2025
-
[3]
https://github.com/python/mypy/
2025.Mypy: Static Typing for Python. https://github.com/python/mypy/
work page 2025
-
[4]
2025.Qwen Chat. https://chat.qwen.ai/ , Vol. 1, No. 1, Article . Publication date: April 2026. TypePro: Boosting LLM-Based Type Inference via Inter-Procedural Slicing 21
work page 2025
-
[5]
Barr, Soline Ducousso, and Zheng Gao
Miltiadis Allamanis, Earl T. Barr, Soline Ducousso, and Zheng Gao. 2020. Typilus: neural type hints. InProceedings of the 41st ACM SIGPLAN Conference on Programming Language Design and Implementation(London, UK)(PLDI 2020). Association for Computing Machinery, New York, NY, USA, 91–105. doi:10.1145/3385412.3385997
-
[6]
Gavin Bierman, Martín Abadi, and Mads Torgersen. 2014. Understanding typescript. InEuropean Conference on Object-Oriented Programming. Springer, 257–281
work page 2014
- [7]
-
[8]
Michael Emmi and Constantin Enea. 2016. Symbolic abstract data type inference. InProceedings of the 43rd Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages(St. Petersburg, FL, USA)(POPL ’16). Association for Computing Machinery, New York, NY, USA, 513–525. doi:10.1145/2837614.2837645
-
[9]
Daniel Fried, Armen Aghajanyan, Jessy Lin, Sida Wang, Eric Wallace, Freda Shi, Ruiqi Zhong, Wen tau Yih, Luke Zettle- moyer, and Mike Lewis. 2023. InCoder: A Generative Model for Code Infilling and Synthesis. arXiv:2204.05999 [cs.SE] https://arxiv.org/abs/2204.05999
work page internal anchor Pith review arXiv 2023
-
[10]
2023.pytype - A Static Type Analyzer for Python Code
Google. 2023.pytype - A Static Type Analyzer for Python Code. https://github.com/google/pytype
work page 2023
-
[11]
Salvatore Guarnieri and V Benjamin Livshits. 2009. GATEKEEPER: Mostly Static Enforcement of Security and Reliability Policies for JavaScript Code.. InUSENIX Security Symposium, Vol. 10. 78–85
work page 2009
- [12]
-
[13]
Yimeng Guo, Zhifei Chen, Lin Chen, Wenjie Xu, Yanhui Li, Yuming Zhou, and Baowen Xu. 2024. Generating Python Type Annotations from Type Inference: How Far Are We?ACM Trans. Softw. Eng. Methodol.33, 5, Article 123 (June 2024), 38 pages. doi:10.1145/3652153
-
[14]
Vincent J Hellendoorn, Christian Bird, Earl T Barr, and Miltiadis Allamanis. 2018. Deep learning type inference. In Proceedings of the 2018 26th acm joint meeting on european software engineering conference and symposium on the foundations of software engineering. 152–162
work page 2018
-
[15]
Susan Horwitz, Thomas Reps, and David Binkley. 1990. Interprocedural slicing using dependence graphs.ACM Transactions on Programming Languages and Systems (TOPLAS)12, 1 (1990), 26–60
work page 1990
-
[16]
2025.Hugging Face Hub: A platform for sharing machine learning models, datasets and demos
Hugging Face Inc. 2025.Hugging Face Hub: A platform for sharing machine learning models, datasets and demos. https://huggingface.co/
work page 2025
-
[17]
Kevin Jesse and Premkumar T. Devanbu. 2022. ManyTypes4TypeScript: a comprehensive TypeScript dataset for sequence-based type inference. InProceedings of the 19th International Conference on Mining Software Repositories (Pittsburgh, Pennsylvania)(MSR ’22). Association for Computing Machinery, New York, NY, USA, 294–298. doi:10. 1145/3524842.3528507
-
[18]
Manning, Hinrich Schütze, and Prabhakar Raghavan
Christopher D. Manning, Hinrich Schütze, and Prabhakar Raghavan. 2008.Introduction to Information Retrieval. Cambridge University Press, Cambridge, UK
work page 2008
-
[19]
2023.Pyright - Static Type Checker for Python
Microsoft. 2023.Pyright - Static Type Checker for Python. https://github.com/microsoft/pyright
work page 2023
-
[20]
Amir M Mir, Evaldas Latoškinas, and Georgios Gousios. 2021. Manytypes4py: A benchmark python dataset for machine learning-based type inference. In2021 IEEE/ACM 18th International Conference on Mining Software Repositories (MSR). IEEE, 585–589
work page 2021
-
[21]
Amir M Mir, Evaldas Latoškinas, Sebastian Proksch, and Georgios Gousios. 2022. Type4py: Practical deep similarity learning-based type inference for python. InProceedings of the 44th International Conference on Software Engineering. 2241–2252
work page 2022
-
[22]
Chris Parnin and Alessandro Orso. 2011. Are automated debugging techniques actually helping programmers?. In Proceedings of the 2011 international symposium on software testing and analysis. 199–209
work page 2011
-
[23]
Zvonimir Pavlinovic, Yusen Su, and Thomas Wies. 2021. Data flow refinement type inference.Proc. ACM Program. Lang.5, POPL, Article 19 (Jan. 2021), 31 pages. doi:10.1145/3434300
-
[24]
Yun Peng, Cuiyun Gao, Zongjie Li, Bowei Gao, David Lo, Qirun Zhang, and Michael Lyu. 2022. Static inference meets deep learning: a hybrid type inference approach for python. InProceedings of the 44th International Conference on Software Engineering. 2019–2030
work page 2022
-
[25]
Yun Peng, Chaozheng Wang, Wenxuan Wang, Cuiyun Gao, and Michael R Lyu. 2023. Generative type inference for python. In2023 38th IEEE/ACM International Conference on Automated Software Engineering (ASE). IEEE, 988–999
work page 2023
-
[26]
Stephen Robertson and Hugo Zaragoza. 2009. The Probabilistic Relevance Framework: BM25 and Beyond.Foundations and Trends®in Information Retrieval3, 4 (2009), 333–389. doi:10.1561/1500000019
-
[27]
Vitalis Salis, Thodoris Sotiropoulos, Panos Louridas, Diomidis Spinellis, and Dimitris Mitropoulos. 2021. Pycg: Practical call graph generation in python. In2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE). IEEE, 1646–1657. , Vol. 1, No. 1, Article . Publication date: April 2026. 22 Teyu Lin, Minghao Fan, Huaxun Huang, Zhirong Sh...
work page 2021
-
[28]
Cristian-Alexandru Staicu, Michael Pradel, and Ben Livshits. 2018. Understanding and automatically preventing injection attacks on Node. js. InNetwork and Distributed System Security Symposium (NDSS)
work page 2018
-
[29]
Hugo Touvron, Thibaut Lavril, Gautier Izacard, Xavier Martinet, Marie-Anne Lachaux, Timothée Lacroix, Baptiste Rozière, Naman Goyal, Eric Hambro, Faisal Azhar, et al. 2023. Llama: Open and efficient foundation language models. arXiv 2023.arXiv preprint arXiv:2302.1397110 (2023)
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[30]
Guido van Rossum, Jukka Lehtosalo, and Łukasz Langa. 2014. PEP 484 – Type Hints. [Online]. Available: https: //peps.python.org/pep-0484/. Accessed: 2025-08-27
work page 2014
- [31]
- [32]
-
[33]
Yue Wang, Weishi Wang, Shafiq Joty, and Steven CH Hoi. 2021. Codet5: Identifier-aware unified pre-trained encoder- decoder models for code understanding and generation.arXiv preprint arXiv:2109.00859(2021)
work page internal anchor Pith review arXiv 2021
- [34]
-
[35]
Zhaogui Xu, Xiangyu Zhang, Lin Chen, Kexin Pei, and Baowen Xu. 2016. Python probabilistic type inference with natural language support. InProceedings of the 2016 24th ACM SIGSOFT international symposium on foundations of software engineering. 607–618
work page 2016
-
[36]
Yanyan Yan, Yang Feng, Hongcheng Fan, and Baowen Xu. 2023. Dlinfer: Deep learning with static slicing for python type inference. In2023 IEEE/ACM 45th International Conference on Software Engineering (ICSE). IEEE, 2009–2021
work page 2023
-
[37]
Ming-Ho Yee and Arjun Guha. 2023. Do Machine Learning Models Produce TypeScript Types That Type Check? Schloss Dagstuhl – Leibniz-Zentrum für Informatik. doi:10.4230/LIPICS.ECOOP.2023.37 , Vol. 1, No. 1, Article . Publication date: April 2026
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.