pith. machine review for the scientific record. sign in

arxiv: 2604.27692 · v1 · submitted 2026-04-30 · 💻 cs.SE

Recognition: unknown

Understanding Bugs in Template Engine-Based Applications: Symptoms, Root Causes, and Fix Patterns

Authors on Pith no claims yet

Pith reviewed 2026-05-07 05:38 UTC · model grok-4.3

classification 💻 cs.SE
keywords template enginesbug studyroot cause analysisfix patternsempirical software engineeringdebuggingJinjaweb applications
0
0 comments X

The pith

Analysis of 1,004 bugs across 15 template engines shows abnormal rendering as the top symptom, driven by syntax misuse and data mismatches.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Template engines power much of modern web and configuration output but create hard-to-debug applications because code is split between a template and a host program in another language. This paper reports the first broad empirical study of real bugs in such applications, drawing on 1,004 cases from 15 engines across five languages. It shows that the most common problem is abnormal rendering—wrong or blank output—in almost half of cases, usually because the template syntax is misused or the data passed in from the host does not match what the template expects. Over two-thirds of fixes stay inside the template, yet a significant share require changes to the host-side logic that supplies the data. The authors turn these patterns into concrete advice for building better debuggers and checking tools, and they release two prototype tools for the popular Jinja engine.

Core claim

By analyzing 1,004 bugs across 15 template engines in five programming languages, the study identifies abnormal rendering result as the most prevalent symptom at 48.61 percent, often as silent failures. It categorizes 17 root causes, dominated by syntax misuse, mismatched data context, and incompatible integration. While most bugs are fixed by editing the template itself, more than 20 percent require changes to the surrounding host application logic to correct data issues. The work also produces actionable recommendations and prototype debugging tools for Jinja.

What carries the argument

Empirical taxonomy that manually classifies 1,004 real-world bugs into symptoms, 17 root causes, and fix patterns from reports involving 15 template engines.

If this is right

  • Most bugs produce silent abnormal output, so developers need better ways to inspect rendered results during testing.
  • Syntax misuse and data context mismatches are leading causes, suggesting value in static checkers that verify template syntax and variable bindings against host data.
  • Because over 20 percent of fixes involve host-side changes, integrated development environments should support navigation between templates and their calling code.
  • The identified patterns can inform the design of automated repair tools that suggest fixes in the correct location.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • This bug taxonomy could be extended to create a shared benchmark for evaluating new analysis tools for multi-language template applications.
  • The prevalence of silent failures implies that dynamic analysis or logging of template outputs should be standard in deployment pipelines.
  • Similar empirical studies on other multi-language or declarative systems, such as query builders or configuration languages, might uncover parallel root cause patterns.

Load-bearing premise

The 1,004 bugs collected from 15 template engines form a representative and unbiased sample of all bugs in template engine-based applications, and the manual classification into symptoms, root causes, and fix patterns is complete and reproducible without significant selection or labeling bias.

What would settle it

A replication study that gathers an independent set of several hundred template engine bugs from additional engines or languages and finds a markedly different distribution, such as fewer than 40 percent abnormal rendering results or root causes outside the 17 reported categories.

Figures

Figures reproduced from arXiv: 2604.27692 by Chang-ai Sun, Kai Gao, Yu Sun.

Figure 1
Figure 1. Figure 1: An architectural overview of TE applications. view at source ↗
Figure 2
Figure 2. Figure 2: The development workflow of TE applications. The template example (a) is written in the templating view at source ↗
Figure 3
Figure 3. Figure 3: The taxonomy and distribution of bug symptoms in TE applications. Dotted rectangles and solid rectan view at source ↗
Figure 4
Figure 4. Figure 4: The symptom distribution of TE application bugs by engine. view at source ↗
Figure 5
Figure 5. Figure 5: The taxonomy and distribution of bug root causes in TE applications. Dotted rectangles and solid view at source ↗
Figure 6
Figure 6. Figure 6: The relationship between symptoms and root causes. view at source ↗
Figure 7
Figure 7. Figure 7: The root cause distribution of TE application bugs by engine. view at source ↗
Figure 8
Figure 8. Figure 8: The taxonomy and distribution of fix patterns for TE application bugs. Dotted rectangles and solid view at source ↗
Figure 9
Figure 9. Figure 9: The relationship between root causes and fix patterns. view at source ↗
Figure 10
Figure 10. Figure 10: The fix pattern distribution of TE application bugs by engine. view at source ↗
read the original abstract

Template engines are indispensable components in modern software ecosystems, enabling the generation of structured documents and scripts across domains such as web development, Infrastructure as Code, and data engineering. However, the unique architectural characteristics of template engine-based applications (i.e., TE applications), including multi-language composition, opaque data flow, deferred validation, and complex integration, pose significant challenges for diagnosing and resolving bugs in TE applications. While prior research has primarily focused on template engine security, bugs in TE applications remain under-investigated. To bridge this gap, we present the first comprehensive study of TE application bugs. By analyzing 1,004 application bugs across 15 template engines in five programming languages, we identify the symptoms and root causes of TE application bugs and common patterns to fix them. Our findings reveal that Abnormal Rendering Result (e.g., unexpected or blank output) is the most prevalent symptom (48.61%), often manifesting as silent failures that are difficult to diagnose. We identify 17 root causes, with Syntax Misuse, Mismatched Data Context, and Incompatible Integration as the dominant categories. Furthermore, we find that while 67.92% of the bugs are fixed within the template, over 20% require modifications in the host-side logic to resolve data context issues. Based on these findings, we derive actionable implications for tool designers, practitioners, and researchers. To demonstrate the practical utility of our findings, we further develop two prototype tools for the Jinja engine to facilitate the development and debugging of TE applications.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 2 minor

Summary. The paper claims to conduct the first comprehensive study of bugs in template engine-based applications by analyzing 1,004 bugs from 15 template engines in five programming languages. It identifies symptoms (with Abnormal Rendering Result at 48.61% being most prevalent), 17 root causes (dominated by Syntax Misuse, Mismatched Data Context, and Incompatible Integration), and fix patterns (67.92% fixed in template, >20% requiring host-side changes). The authors also develop two prototype tools for the Jinja template engine to support development and debugging based on their findings.

Significance. Should the methodology prove sound upon clarification, the study would offer significant value to the software engineering community by providing an empirical taxonomy of bugs in an important but understudied class of applications involving template engines. The large sample size and cross-language, cross-engine analysis are strengths. The practical contribution of prototype tools for Jinja further enhances its utility. However, the current presentation does not allow full assessment of the claims' reliability due to missing methodological details.

major comments (3)
  1. [§3] §3 (Data Collection): The paper provides no details on bug collection sources, GitHub search queries, selected repositories for the 15 engines, time period, or inclusion/exclusion criteria used to arrive at the 1,004 bugs. This is load-bearing for the central claims, as the reported symptom prevalence (48.61% Abnormal Rendering Result) and root-cause distributions cannot be evaluated for selection bias without this information.
  2. [§4] §4 (Classification and Taxonomy): The manual classification into symptoms, 17 root causes, and fix patterns includes no description of inter-rater agreement metrics, resolution of disagreements, or validation steps. This directly undermines the reproducibility of the dominance ordering (Syntax Misuse, Mismatched Data Context, Incompatible Integration) and the fix-pattern statistics (67.92% template-only fixes).
  3. [§5] §5 (Fix Patterns): The finding that over 20% of bugs require host-side logic changes rests entirely on the unvalidated classification; any sampling or labeling bias would render the 67.92% / >20% split non-generalizable.
minor comments (2)
  1. [Abstract] Abstract: The abstract states the sample size and key percentages but does not name the five programming languages or the 15 template engines, which would immediately contextualize the scope.
  2. [§6] §6 (Tools): The prototype tools for Jinja are a positive practical contribution, but additional details on their implementation, usage, and any evaluation would strengthen the demonstration of utility.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive review and for recognizing the potential value of our large-scale empirical study on bugs in template engine-based applications. We agree that greater methodological transparency is needed to allow full assessment of our claims and to address concerns about reproducibility and bias. We will revise the manuscript to incorporate detailed descriptions of data collection and classification procedures. Below we respond point-by-point to the major comments.

read point-by-point responses
  1. Referee: [§3] §3 (Data Collection): The paper provides no details on bug collection sources, GitHub search queries, selected repositories for the 15 engines, time period, or inclusion/exclusion criteria used to arrive at the 1,004 bugs. This is load-bearing for the central claims, as the reported symptom prevalence (48.61% Abnormal Rendering Result) and root-cause distributions cannot be evaluated for selection bias without this information.

    Authors: We acknowledge that §3 currently lacks the level of detail required to evaluate selection bias. In the revised manuscript we will expand §3 with a new subsection that explicitly describes: the primary sources (GitHub issues and pull requests), the search queries and keywords employed to locate relevant reports, the specific repositories selected for each of the 15 engines together with the rationale for their selection, the overall time period covered, and the precise inclusion/exclusion criteria applied to reach the final set of 1,004 bugs. These additions will enable readers to assess the generalizability of the symptom and root-cause distributions we report. revision: yes

  2. Referee: [§4] §4 (Classification and Taxonomy): The manual classification into symptoms, 17 root causes, and fix patterns includes no description of inter-rater agreement metrics, resolution of disagreements, or validation steps. This directly undermines the reproducibility of the dominance ordering (Syntax Misuse, Mismatched Data Context, and Incompatible Integration) and the fix-pattern statistics (67.92% template-only fixes).

    Authors: We agree that a description of the classification process is essential for reproducibility. In the revision we will augment §4 with a dedicated paragraph detailing the procedure: two authors independently classified the bugs using a taxonomy refined through a pilot study; disagreements were resolved via discussion, with a third author consulted when consensus could not be reached; and we will report inter-rater agreement using Cohen’s kappa for each classification dimension (symptoms, root causes, fix patterns). We will also describe the validation steps performed, including iterative taxonomy refinement and cross-checking on a held-out subset. These changes will support the reported dominance ordering and fix-pattern statistics. revision: yes

  3. Referee: [§5] §5 (Fix Patterns): The finding that over 20% of bugs require host-side logic changes rests entirely on the unvalidated classification; any sampling or labeling bias would render the 67.92% / >20% split non-generalizable.

    Authors: We concur that the fix-pattern results in §5 depend directly on the classification described in §4. By incorporating the inter-rater agreement metrics, disagreement-resolution process, and validation steps into the revised §4, we will strengthen the evidential basis for the 67.92% template-only and >20% host-side figures. In the updated §5 we will explicitly cross-reference the enhanced methodological description, provide additional illustrative examples of host-side changes, and discuss limitations on generalizability. This integrated revision will improve the reliability of the reported split. revision: yes

Circularity Check

0 steps flagged

No circularity: purely empirical bug classification with external data grounding

full rationale

The paper performs a manual analysis of 1,004 bugs mined from GitHub issues across 15 template engines. It reports symptom frequencies (e.g., Abnormal Rendering Result at 48.61%), 17 root-cause categories, and fix-pattern statistics without any equations, fitted parameters, predictions, or first-principles derivations. No step reduces a claimed result to its own inputs by construction, self-definition, or self-citation load-bearing. The taxonomy and percentages are produced by direct inspection of external issue reports rather than any internal normalization or uniqueness theorem imported from prior author work. Self-citations, if present, are incidental and not invoked to justify the completeness of the 17 root causes or the representativeness of the sample. The derivation chain is therefore self-contained against the collected bug corpus.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

This is an empirical bug-classification study. It introduces no free parameters, new mathematical axioms, or invented entities. The central claims rest on the unverified assumption that the collected bug set is representative and that the manual categorization is reliable.

axioms (1)
  • domain assumption The 1,004 bugs collected across 15 template engines form a representative sample of bugs in template engine-based applications.
    All prevalence statistics and the claim of comprehensive coverage depend on this assumption.

pith-pipeline@v0.9.0 · 5572 in / 1577 out tokens · 68347 ms · 2026-05-07T05:38:23.028185+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

95 extracted references · 54 canonical work pages · 1 internal anchor

  1. [1]

    Mouna Abidi, Md Saidur Rahman, Moses Openja, and Foutse Khomh. 2021. Are Multi-Language Design Smells Fault- Prone? An Empirical Study.ACM Trans. Softw. Eng. Methodol.30, 3, Article 29 (Feb. 2021), 56 pages. doi:10.1145/3432690

  2. [2]

    Rigo Armin and Fijalkowski Maciej. 2026. CFFI Overview. https://cffi.readthedocs.io/en/stable/overview.html. Accessed: [2026.01.20]

  3. [3]

    Mahi Begoug, Narjes Bessghaier, Ali Ouni, Eman Abdullah AlOmar, and Mohamed Wiem Mkaouer. 2023. What Do Infrastructure-as-Code Practitioners Discuss: An Empirical Study on Stack Overflow. In2023 ACM/IEEE International Symposium on Empirical Software Engineering and Measurement (ESEM). IEEE Press, 1–12. doi:10.1109/ESEM56168. 2023.10304847

  4. [4]

    Junming Cao, Bihuan Chen, Chao Sun, Longjie Hu, Shuaihong Wu, and Xin Peng. 2022. Understanding performance problems in deep learning systems. InProceedings of the 30th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering(Singapore, Singapore)(ESEC/FSE 2022). Association for Computing Machinery, New Y...

  5. [5]

    The Ultimate Configuration Management Tool? Lessons from a Mixed Methods Study of Ansible's Challenges

    Carolina Carreira, Nuno Saavedra, Alexandra Mendes, and João F. Ferreira. 2025. The Ultimate Configuration Management Tool? Lessons from a Mixed Methods Study of Ansible’s Challenges. arXiv:2504.08678 [cs.SE] https: //arxiv.org/abs/2504.08678

  6. [6]

    Chunyang Chen and Zhenchang Xing. 2016. Towards Correlating Search on Google and Asking on Stack Overflow. In2016 IEEE 40th Annual Computer Software and Applications Conference (COMPSAC), Vol. 1. 83–92. doi:10.1109/ COMPSAC.2016.210

  7. [7]

    Junjie Chen, Yihua Liang, Qingchao Shen, Jiajun Jiang, and Shuochuan Li. 2023. Toward Understanding Deep Learning Framework Bugs.ACM Trans. Softw. Eng. Methodol.32, 6, Article 135 (Sept. 2023), 31 pages. doi:10.1145/3587155

  8. [8]

    Yuntianyi Chen, Yuqi Huai, Yirui He, Shilong Li, Changnam Hong, Qi Alfred Chen, and Joshua Garcia. 2025. A Comprehensive Study of Bug-Fix Patterns in Autonomous Driving Systems.Proc. ACM Softw. Eng.2, FSE, Article FSE018 (June 2025), 23 pages. doi:10.1145/3715733

  9. [9]

    Zhenpeng Chen, Huihan Yao, Yiling Lou, Yanbin Cao, Yuanqiang Liu, Haoyu Wang, and Xuanzhe Liu. 2021. An Empirical Study on Deployment Faults of Deep Learning Based Mobile Applications. InProceedings of the 43rd International Conference on Software Engineering(Madrid, Spain)(ICSE ’21). IEEE Press, 674–685. doi:10.1109/ICSE43902.2021.00068

  10. [10]

    dbt Labs, Inc. 2026. dbt. https://github.com/dbt-labs/dbt-core. Accessed: [2026.01.19]

  11. [11]

    Taijara Loiola De Santana, Paulo Anselmo Da Mota Silveira Neto, Eduardo Santana De Almeida, and Iftekhar Ahmed

  12. [12]

    Bug Analysis in Jupyter Notebook Projects: An Empirical Study.ACM Trans. Softw. Eng. Methodol.33, 4, Article 101 (April 2024), 34 pages. doi:10.1145/3641539

  13. [13]

    Levin, Wolfram Schulte, and Milos Gligoric

    Cheng Ding, Zhong Xu, Michael Y. Levin, Wolfram Schulte, and Milos Gligoric. 2026. TypeJinja: Static Type Checking of Jinja Templates at dbt Labs. InInternational Conference on Software Engineering, Software Engineering in Practice. Association for Computing Machinery, New York, NY, USA, To appear

  14. [14]

    Georgios-Petros Drosos, Thodoris Sotiropoulos, Georgios Alexopoulos, Dimitris Mitropoulos, and Zhendong Su. 2024. When Your Infrastructure Is a Buggy Program: Understanding Faults in Infrastructure as Code Ecosystems.Proc. ACM Program. Lang.8, OOPSLA2, Article 359 (Oct. 2024), 31 pages. doi:10.1145/3689799

  15. [15]

    Xiaohu Du, Xiao Chen, Jialun Cao, Ming Wen, Shing-Chi Cheung, and Hai Jin. 2023. Understanding the Bug Charac- teristics and Fix Strategies of Federated Learning Systems. InProceedings of the 31st ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering(San Francisco, CA, USA)(ESEC/FSE 2023). Association ...

  16. [16]

    Matthew Eernisse. 2026. EJS. https://github.com/mde/ejs. Accessed: [2026-01-21]

  17. [17]

    Morven Gentleman

    Bo Einarsson and W. Morven Gentleman. 1984. Mixed language programming.Software: Practice and Experience14, 4 (1984), 383–395. arXiv:https://onlinelibrary.wiley.com/doi/pdf/10.1002/spe.4380140410 doi:10.1002/spe.4380140410

  18. [18]

    Apache Software Foundation. 2025. FreeMarker. https://github.com/apache/freemarker. Accessed: [2026-01-21]

  19. [19]

    Apache Software Foundation. 2025. Velocity. https://github.com/PaperMC/Velocity. Accessed: [2026-01-21]

  20. [20]

    Django Software Foundation. 2026. Django. https://www.djangoproject.com/. Accessed: [2026.01.19]

  21. [21]

    Django Software Foundation. 2026. Django-Template. https://docs.djangoproject.com/en/6.0/topics/templates/. Ac- cessed: [2026-01-21]

  22. [22]

    Python Software Foundation. 2026. Formatted String Literals. https://docs.python.org/3/tutorial/inputoutput.html# formatted-string-literals. Accessed: [2026.01.23]

  23. [23]

    The Apache Software Foundation. 2026. Apache Airflow. https://airflow.apache.org/. Accessed: [2026.01.19]

  24. [24]

    Joshua Garcia, Yang Feng, Junjie Shen, Sumaya Almanee, Yuan Xia, and Qi Alfred Chen. 2020. A comprehensive study of autonomous vehicle bugs. InProceedings of the ACM/IEEE 42nd International Conference on Software Engineering (Seoul, South Korea)(ICSE ’20). Association for Computing Machinery, New York, NY, USA, 385–396. doi:10.1145/ 3377811.3380397

  25. [25]

    GitHub. 2026. About the dependency graph. https://docs.github.com/en/code-security/concepts/supply-chain-security/ about-the-dependency-graph. Accessed: 2026-04-30

  26. [26]

    Eghan, and Bram Adams

    Manel Grichi, Mouna Abidi, Fehmi Jaafar, Ellis E. Eghan, and Bram Adams. 2021. On the Impact of Interlanguage Dependencies in Multilanguage Systems Empirical Case Study on Java Native Interface Applications (JNI).IEEE Transactions on Reliability70, 1 (2021), 428–440. doi:10.1109/TR.2020.3024873

  27. [27]

    New Digital Group. 2026. Smarty. https://github.com/smarty-php/smarty. Accessed: [2026-01-21]

  28. [28]

    Hao Guan, Ying Xiao, Jiaying Li, Yepang Liu, and Guangdong Bai. 2023. A Comprehensive Study of Real-World Bugs in Machine Learning Model Optimization. InProceedings of the 45th International Conference on Software Engineering (Melbourne, Victoria, Australia)(ICSE ’23). IEEE Press, 147–158. doi:10.1109/ICSE48619.2023.00024

  29. [29]

    Sture Holm. 1979. A Simple Sequentially Rejective Multiple Test Procedure.Scandinavian Journal of Statistics6, 2 (1979), 65–70. http://www.jstor.org/stable/4615733

  30. [30]

    Mingzhe Hu and Yu Zhang. 2023. An empirical study of the Python/C API on evolution and bug patterns.Journal of Software: Evolution and Process35, 2 (2023), e2507. arXiv:https://onlinelibrary.wiley.com/doi/pdf/10.1002/smr.2507 doi:10.1002/smr.2507 , Vol. 1, No. 1, Article . Publication date: May 2018. Understanding Bugs in Template Engine-Based Application...

  31. [31]

    Kaifeng Huang, Bihuan Chen, Susheng Wu, Junming Cao, Lei Ma, and Xin Peng. 2023. Demystifying Dependency Bugs in Deep Learning Stack. InProceedings of the 31st ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering(San Francisco, CA, USA)(ESEC/FSE 2023). Association for Computing Machinery, New York, NY...

  32. [32]

    Nargiz Humbatova, Gunel Jahangirova, Gabriele Bavota, Vincenzo Riccio, Andrea Stocco, and Paolo Tonella. 2020. Taxonomy of real faults in deep learning systems. InProceedings of the ACM/IEEE 42nd International Conference on Software Engineering(Seoul, South Korea)(ICSE ’20). Association for Computing Machinery, New York, NY, USA, 1110–1121. doi:10.1145/33...

  33. [33]

    Md Johirul Islam, Giang Nguyen, Rangeet Pan, and Hridesh Rajan. 2019. A comprehensive study on deep learning bug characteristics. InProceedings of the 2019 27th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering(Tallinn, Estonia)(ESEC/FSE 2019). Association for Computing Machinery, New Yo...

  34. [34]

    Md Johirul Islam, Rangeet Pan, Giang Nguyen, and Hridesh Rajan. 2020. Repairing deep neural networks: fix patterns and challenges. InProceedings of the ACM/IEEE 42nd International Conference on Software Engineering(Seoul, South Korea)(ICSE ’20). Association for Computing Machinery, New York, NY, USA, 1135–1146. doi:10.1145/3377811.3380378

  35. [35]

    Yehuda Katz. 2026. Handlebars.js. https://github.com/handlebars-lang/handlebars.js. Accessed: [2026-01-21]

  36. [36]

    James Kettle. 2015. Server-Side Template Injection: RCE for the Modern Webapp. https://blackhat.com/docs/us- 15/materials/us-15-Kettle-Server-Side-Template-Injection-RCE-For-The-Modern-Web-App-wp.pdf

  37. [37]

    Pavneet Singh Kochhar, Dinusha Wijedasa, and David Lo. 2016. A Large Scale Study of Multiple Programming Lan- guages and Code Quality. In2016 IEEE 23rd International Conference on Software Analysis, Evolution, and Reengineering (SANER), Vol. 1. 563–573. doi:10.1109/SANER.2016.112

  38. [38]

    Kruskal and W

    William H. Kruskal and W. Allen Wallis. 1952. Use of Ranks in One-Criterion Variance Analysis.J. Amer. Statist. Assoc. 47, 260 (1952), 583–621. arXiv:https://doi.org/10.1080/01621459.1952.10483441 doi:10.1080/01621459.1952.10483441

  39. [39]

    Laravel. 2026. Blade. https://laravel.com/docs/12.x/blade. Accessed: [2026-01-21]

  40. [40]

    Jan Lehnardt. 2023. mustache.js. https://github.com/janl/mustache.js. Accessed: [2026-01-21]

  41. [41]

    Wen Li, Li Li, and Haipeng Cai. 2022. On the vulnerability proneness of multilingual code. InProceedings of the 30th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering (Singapore, Singapore)(ESEC/FSE 2022). Association for Computing Machinery, New York, NY, USA, 847–859. doi:10. 1145/3540250.3549173

  42. [42]

    Zengyang Li, Sicheng Wang, Wenshuo Wang, Peng Liang, Ran Mo, and Bing Li. 2023. Understanding Bugs in Multi- Language Deep Learning Frameworks. In2023 IEEE/ACM 31st International Conference on Program Comprehension (ICPC). 328–338. doi:10.1109/ICPC58990.2023.00047

  43. [43]

    Zengyang Li, Wenshuo Wang, Sicheng Wang, Peng Liang, and Ran Mo. 2023. Understanding Resolution of Multi- Language Bugs: An Empirical Study on Apache Projects. In2023 ACM/IEEE International Symposium on Empirical Software Engineering and Measurement (ESEM). 1–11. doi:10.1109/ESEM56168.2023.10304793

  44. [44]

    Yepang Liu, Chang Xu, and Shing-Chi Cheung. 2014. Characterizing and detecting performance bugs for smartphone applications. InProceedings of the 36th International Conference on Software Engineering(Hyderabad, India)(ICSE 2014). Association for Computing Machinery, New York, NY, USA, 1013–1024. doi:10.1145/2568225.2568229

  45. [45]

    Weiqi Lu, Yongqiang Tian, Xiaohan Zhong, Haoyang Ma, Zhenyang Xu, Shing-Chi Cheung, and Chengnian Sun. 2025. An Empirical Study of Bugs in Data Visualization Libraries.Proc. ACM Softw. Eng.2, FSE, Article FSE093 (June 2025), 24 pages. doi:10.1145/3729363

  46. [46]

    H. B. Mann and D. R. Whitney. 1947. On a Test of Whether one of Two Random Variables is Stochastically Larger than the Other.The Annals of Mathematical Statistics18, 1 (1947), 50–60. http://www.jstor.org/stable/2236101

  47. [47]

    Philip Mayer and Alexander Bauer. 2015. An empirical analysis of the utilization of multiple programming languages in open source projects. InProceedings of the 19th International Conference on Evaluation and Assessment in Software Engineering(Nanjing, China)(EASE ’15). Association for Computing Machinery, New York, NY, USA, Article 4, 10 pages. doi:10.11...

  48. [48]

    Philip Mayer, Michael Kirsch, and Minh Anh Le. 2017. On multi-language software development, cross-language links and accompanying tools: a survey of professional software developers.Journal of Software Engineering Research and Development5, 1 (19 Apr 2017), 1. doi:10.1186/s40411-017-0035-z

  49. [49]

    Oracle. 2025. Java Native Interface Specification. https://docs.oracle.com/en/java/javase/25/docs/specs/jni/intro.html. Accessed: [2026.01.20]

  50. [50]

    Pallets. 2026. How fast is Jinja? https://jinja.palletsprojects.com/en/stable/faq/#how-fast-is-jinja. Accessed: [2026.01.23]

  51. [51]

    Pallets. 2026. Jinja. https://github.com/pallets/jinja. Accessed: [2026-01-21]

  52. [52]

    Terence John Parr. 2004. Enforcing strict model-view separation in template engines. InProceedings of the 13th International Conference on World Wide Web(New York, NY, USA)(WWW ’04). Association for Computing Machinery, , Vol. 1, No. 1, Article . Publication date: May 2018. 28 Kai Gao, Yu Sun, and Chang-ai Sun New York, NY, USA, 224–233. doi:10.1145/988672.988703

  53. [53]

    Lorenzo Pisu, Davide Maiorca, and Giorgio Giacinto. 2024. A Survey of the Overlooked Dangers of Template Engines. arXiv:2405.01118 [cs.CR] https://arxiv.org/abs/2405.01118

  54. [54]

    Pug. 2024. Pug. https://github.com/pugjs/pug. Accessed: [2026-01-21]

  55. [55]

    PyPI. 2025. PyPI Architectural Overview. https://warehouse.pypa.io/application/. Accessed: [2026.01.19]

  56. [56]

    Ravishka Rathnasuriya, Nidhi Majoju, Zihe Song, and Wei Yang. 2025. An Investigation on Numerical Bugs in GPU Programs Towards Automated Bug Detection.Proc. ACM Softw. Eng.2, ISSTA, Article ISSTA073 (June 2025), 24 pages. doi:10.1145/3728950

  57. [57]

    RedHat, Inc. 2026. Ansible. https://github.com/ansible/ansible. Accessed: [2026.01.19]

  58. [58]

    ROCK-SE. 2026. template-engine-bugs. https://github.com/ROCK-SE/template-engine-bugs. Accessed: 2026-04-29

  59. [59]

    Ruby. 2026. ERB. https://github.com/ruby/erb. Accessed: [2026-01-21]

  60. [60]

    Carolyn B. Seaman. 1999. Qualitative Methods in Empirical Studies of Software Engineering.IEEE Trans. Softw. Eng. 25, 4 (July 1999), 557–572. doi:10.1109/32.799955

  61. [61]

    Shrestha Shailabh, Stefano Balocco, and Marat Reimers. 2026. Awesome Template Engine: A curated list of template engines across all programming languages with repository activity stats. https://github.com/sshailabh/awesome- template-engine. Accessed: [2026-01-21]

  62. [62]

    Qingchao Shen, Haoyang Ma, Junjie Chen, Yongqiang Tian, Shing-Chi Cheung, and Xiang Chen. 2021. A comprehensive study of deep learning compiler bugs. InProceedings of the 29th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering(Athens, Greece)(ESEC/FSE 2021). Association for Computing Mach...

  63. [63]

    Shopify. 2026. Liquid. https://github.com/Shopify/liquid. Accessed: [2026-01-21]

  64. [64]

    Jiayin Song, Yike Li, Yunzhe Tian, Haoxuan Ma, Honglei Li, Jie Zuo, Jiqiang Liu, and Wenjia Niu. 2025. Investigating the bugs in reinforcement learning programs: Insights from Stack Overflow and GitHub.Automated Software Engineering 33, 1 (23 Sep 2025), 9. doi:10.1007/s10515-025-00555-z

  65. [65]

    Nawrin Sultana, Justin Middleton, Jeffrey Overbey, and Munawar Hafiz. 2016. Understanding and fixing multiple language interoperability issues: the C/Fortran case. InProceedings of the 38th International Conference on Software Engineering(Austin, Texas)(ICSE ’16). Association for Computing Machinery, New York, NY, USA, 772–783. doi:10. 1145/2884781.2884858

  66. [66]

    Symfony. 2026. Twig. https://github.com/twigphp/Twig. Accessed: [2026-01-21]

  67. [67]

    VMware Tanzu. 2026. Spring. https://spring.io/. Accessed: [2026.01.19]

  68. [68]

    Haml Team. 2026. Haml. https://github.com/haml/haml. Accessed: [2026-01-21]

  69. [69]

    Thymeleaf. 2025. Thymeleaf. https://github.com/thymeleaf/thymeleaf. Accessed: [2026-01-21]

  70. [70]

    Federico Tomassetti and Marco Torchiano. 2014. An empirical assessment of polyglot-ism in GitHub. InProceedings of the 18th International Conference on Evaluation and Assessment in Software Engineering(London, England, United Kingdom) (EASE ’14). Association for Computing Machinery, New York, NY, USA, Article 17, 4 pages. doi:10.1145/2601248.2601269

  71. [71]

    Christoph Treude, Ohad Barzilay, and Margaret-Anne Storey. 2011. How do programmers ask and answer questions on the web?: NIER track. In2011 33rd International Conference on Software Engineering (ICSE). 804–807. doi:10.1145/ 1985793.1985907

  72. [72]

    Anthony J Viera and Joanne M Garrett. 2005. Understanding interobserver agreement: the kappa statistic.Family medicine37, 5 (May 2005), 360—363. http://europepmc.org/abstract/MED/15883903

  73. [73]

    Bo Wang, Chong Chen, Junjie Chen, Bowen Xu, Chen Ye, Youfang Lin, Guoliang Dong, and Jun Sun. 2025. A Comprehensive Study of OOP-Related Bugs in C++ Compilers.IEEE Transactions on Software Engineering51, 6 (2025), 1762–1782. doi:10.1109/TSE.2025.3566490

  74. [74]

    Wikipedia. 2025. Convention over configuration. https://en.wikipedia.org/wiki/Convention_over_configuration). Accessed: [2026.04.28]

  75. [75]

    Wikipedia. 2025. Foreign function interface. https://en.wikipedia.org/wiki/Foreign_function_interface. Accessed: [2026.01.22]

  76. [76]

    Wikipedia. 2025. Polyglot (computing). https://en.wikipedia.org/wiki/Polyglot_(computing). Accessed: [2026.01.20]

  77. [77]

    Wikipedia. 2026. Scaffold (programming). https://en.wikipedia.org/wiki/Scaffold_(programming). Accessed: [2026.04.26]

  78. [78]

    Wikipedia. 2026. Snippet (programming). https://en.wikipedia.org/wiki/Snippet_(programming). Accessed: [2026.04.26]

  79. [79]

    Wikipedia. 2026. Template processor. https://en.wikipedia.org/wiki/Template_processor. Accessed: [2026.01.19]

  80. [80]

    Jianyu Wu, Hao He, Wenxin Xiao, Kai Gao, and Minghui Zhou. 2022. Demystifying software release note issues on GitHub. InProceedings of the 30th IEEE/ACM International Conference on Program Comprehension(Virtual Event)(ICPC ’22). Association for Computing Machinery, New York, NY, USA, 602–613. doi:10.1145/3524610.3527919 , Vol. 1, No. 1, Article . Publicat...

Showing first 80 references.