arxiv: 2604.27692 · v1 · submitted 2026-04-30 · 💻 cs.SE

Recognition: unknown

Understanding Bugs in Template Engine-Based Applications: Symptoms, Root Causes, and Fix Patterns

Kai Gao , Yu Sun , Chang-ai Sun

Authors on Pith no claims yet

Pith reviewed 2026-05-07 05:38 UTC · model grok-4.3

classification 💻 cs.SE

keywords template enginesbug studyroot cause analysisfix patternsempirical software engineeringdebuggingJinjaweb applications

0 comments

The pith

Analysis of 1,004 bugs across 15 template engines shows abnormal rendering as the top symptom, driven by syntax misuse and data mismatches.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Template engines power much of modern web and configuration output but create hard-to-debug applications because code is split between a template and a host program in another language. This paper reports the first broad empirical study of real bugs in such applications, drawing on 1,004 cases from 15 engines across five languages. It shows that the most common problem is abnormal rendering—wrong or blank output—in almost half of cases, usually because the template syntax is misused or the data passed in from the host does not match what the template expects. Over two-thirds of fixes stay inside the template, yet a significant share require changes to the host-side logic that supplies the data. The authors turn these patterns into concrete advice for building better debuggers and checking tools, and they release two prototype tools for the popular Jinja engine.

Core claim

By analyzing 1,004 bugs across 15 template engines in five programming languages, the study identifies abnormal rendering result as the most prevalent symptom at 48.61 percent, often as silent failures. It categorizes 17 root causes, dominated by syntax misuse, mismatched data context, and incompatible integration. While most bugs are fixed by editing the template itself, more than 20 percent require changes to the surrounding host application logic to correct data issues. The work also produces actionable recommendations and prototype debugging tools for Jinja.

What carries the argument

Empirical taxonomy that manually classifies 1,004 real-world bugs into symptoms, 17 root causes, and fix patterns from reports involving 15 template engines.

If this is right

Most bugs produce silent abnormal output, so developers need better ways to inspect rendered results during testing.
Syntax misuse and data context mismatches are leading causes, suggesting value in static checkers that verify template syntax and variable bindings against host data.
Because over 20 percent of fixes involve host-side changes, integrated development environments should support navigation between templates and their calling code.
The identified patterns can inform the design of automated repair tools that suggest fixes in the correct location.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

This bug taxonomy could be extended to create a shared benchmark for evaluating new analysis tools for multi-language template applications.
The prevalence of silent failures implies that dynamic analysis or logging of template outputs should be standard in deployment pipelines.
Similar empirical studies on other multi-language or declarative systems, such as query builders or configuration languages, might uncover parallel root cause patterns.

Load-bearing premise

The 1,004 bugs collected from 15 template engines form a representative and unbiased sample of all bugs in template engine-based applications, and the manual classification into symptoms, root causes, and fix patterns is complete and reproducible without significant selection or labeling bias.

What would settle it

A replication study that gathers an independent set of several hundred template engine bugs from additional engines or languages and finds a markedly different distribution, such as fewer than 40 percent abnormal rendering results or root causes outside the 17 reported categories.

Figures

Figures reproduced from arXiv: 2604.27692 by Chang-ai Sun, Kai Gao, Yu Sun.

**Figure 1.** Figure 1: An architectural overview of TE applications. view at source ↗

**Figure 2.** Figure 2: The development workflow of TE applications. The template example (a) is written in the templating view at source ↗

**Figure 3.** Figure 3: The taxonomy and distribution of bug symptoms in TE applications. Dotted rectangles and solid rectan view at source ↗

**Figure 4.** Figure 4: The symptom distribution of TE application bugs by engine. view at source ↗

**Figure 5.** Figure 5: The taxonomy and distribution of bug root causes in TE applications. Dotted rectangles and solid view at source ↗

**Figure 6.** Figure 6: The relationship between symptoms and root causes. view at source ↗

**Figure 7.** Figure 7: The root cause distribution of TE application bugs by engine. view at source ↗

**Figure 8.** Figure 8: The taxonomy and distribution of fix patterns for TE application bugs. Dotted rectangles and solid view at source ↗

**Figure 9.** Figure 9: The relationship between root causes and fix patterns. view at source ↗

**Figure 10.** Figure 10: The fix pattern distribution of TE application bugs by engine. view at source ↗

read the original abstract

Template engines are indispensable components in modern software ecosystems, enabling the generation of structured documents and scripts across domains such as web development, Infrastructure as Code, and data engineering. However, the unique architectural characteristics of template engine-based applications (i.e., TE applications), including multi-language composition, opaque data flow, deferred validation, and complex integration, pose significant challenges for diagnosing and resolving bugs in TE applications. While prior research has primarily focused on template engine security, bugs in TE applications remain under-investigated. To bridge this gap, we present the first comprehensive study of TE application bugs. By analyzing 1,004 application bugs across 15 template engines in five programming languages, we identify the symptoms and root causes of TE application bugs and common patterns to fix them. Our findings reveal that Abnormal Rendering Result (e.g., unexpected or blank output) is the most prevalent symptom (48.61%), often manifesting as silent failures that are difficult to diagnose. We identify 17 root causes, with Syntax Misuse, Mismatched Data Context, and Incompatible Integration as the dominant categories. Furthermore, we find that while 67.92% of the bugs are fixed within the template, over 20% require modifications in the host-side logic to resolve data context issues. Based on these findings, we derive actionable implications for tool designers, practitioners, and researchers. To demonstrate the practical utility of our findings, we further develop two prototype tools for the Jinja engine to facilitate the development and debugging of TE applications.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

This paper supplies a usable taxonomy and prevalence numbers for bugs in template-engine apps from 1004 cases, which is new and practical, but the sampling and labeling steps look thin on validation.

read the letter

The core contribution is an empirical breakdown of 1004 bugs across 15 template engines in five languages. Abnormal rendering results account for nearly half the cases, 17 root-cause categories are defined with syntax misuse and data-context mismatches at the top, and two-thirds of fixes stay inside the template while over 20 percent touch host logic. They also ship two prototype tools for Jinja. That package of symptoms, causes, fix locations, and working tools is the first broad look at non-security bugs in this setting, and the numbers give developers something concrete to act on in web, IaC, and data pipelines work. The authors clearly did the legwork of pulling real issues and mapping them to categories, which is more than most prior template papers managed. The prototypes show they tried to close the loop from findings to practice. The weak points sit in the data pipeline. Bug collection from GitHub issues with keyword searches tends to over-represent easily described symptoms and under-represent silent or unreported problems. The abstract and stress-test note give no inter-rater agreement figures or explicit selection criteria, so the 48.61 percent and 67.92 percent figures could move under a different sample or labeling protocol. The choice of only 15 engines also leaves open whether the dominance ordering holds for less popular engines or closed-source use. These are standard risks in mining studies rather than fatal flaws, but they need explicit discussion and sensitivity checks in the full text. Readers who build debugging tools or write guidelines for template-heavy code will find the taxonomy and fix-location stats directly useful. Researchers doing similar empirical work on language-specific bugs can treat it as a baseline to extend. The paper is coherent on its own terms and engages the right prior literature on template security versus general bugs. It deserves a serious referee who can press on the methodology section and ask for the raw classification protocol or agreement metrics. I would send it out rather than desk-reject, with the expectation that revisions will tighten the data description and perhaps add a small validation set.

Referee Report

3 major / 2 minor

Summary. The paper claims to conduct the first comprehensive study of bugs in template engine-based applications by analyzing 1,004 bugs from 15 template engines in five programming languages. It identifies symptoms (with Abnormal Rendering Result at 48.61% being most prevalent), 17 root causes (dominated by Syntax Misuse, Mismatched Data Context, and Incompatible Integration), and fix patterns (67.92% fixed in template, >20% requiring host-side changes). The authors also develop two prototype tools for the Jinja template engine to support development and debugging based on their findings.

Significance. Should the methodology prove sound upon clarification, the study would offer significant value to the software engineering community by providing an empirical taxonomy of bugs in an important but understudied class of applications involving template engines. The large sample size and cross-language, cross-engine analysis are strengths. The practical contribution of prototype tools for Jinja further enhances its utility. However, the current presentation does not allow full assessment of the claims' reliability due to missing methodological details.

major comments (3)

[§3] §3 (Data Collection): The paper provides no details on bug collection sources, GitHub search queries, selected repositories for the 15 engines, time period, or inclusion/exclusion criteria used to arrive at the 1,004 bugs. This is load-bearing for the central claims, as the reported symptom prevalence (48.61% Abnormal Rendering Result) and root-cause distributions cannot be evaluated for selection bias without this information.
[§4] §4 (Classification and Taxonomy): The manual classification into symptoms, 17 root causes, and fix patterns includes no description of inter-rater agreement metrics, resolution of disagreements, or validation steps. This directly undermines the reproducibility of the dominance ordering (Syntax Misuse, Mismatched Data Context, Incompatible Integration) and the fix-pattern statistics (67.92% template-only fixes).
[§5] §5 (Fix Patterns): The finding that over 20% of bugs require host-side logic changes rests entirely on the unvalidated classification; any sampling or labeling bias would render the 67.92% / >20% split non-generalizable.

minor comments (2)

[Abstract] Abstract: The abstract states the sample size and key percentages but does not name the five programming languages or the 15 template engines, which would immediately contextualize the scope.
[§6] §6 (Tools): The prototype tools for Jinja are a positive practical contribution, but additional details on their implementation, usage, and any evaluation would strengthen the demonstration of utility.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive review and for recognizing the potential value of our large-scale empirical study on bugs in template engine-based applications. We agree that greater methodological transparency is needed to allow full assessment of our claims and to address concerns about reproducibility and bias. We will revise the manuscript to incorporate detailed descriptions of data collection and classification procedures. Below we respond point-by-point to the major comments.

read point-by-point responses

Referee: [§3] §3 (Data Collection): The paper provides no details on bug collection sources, GitHub search queries, selected repositories for the 15 engines, time period, or inclusion/exclusion criteria used to arrive at the 1,004 bugs. This is load-bearing for the central claims, as the reported symptom prevalence (48.61% Abnormal Rendering Result) and root-cause distributions cannot be evaluated for selection bias without this information.

Authors: We acknowledge that §3 currently lacks the level of detail required to evaluate selection bias. In the revised manuscript we will expand §3 with a new subsection that explicitly describes: the primary sources (GitHub issues and pull requests), the search queries and keywords employed to locate relevant reports, the specific repositories selected for each of the 15 engines together with the rationale for their selection, the overall time period covered, and the precise inclusion/exclusion criteria applied to reach the final set of 1,004 bugs. These additions will enable readers to assess the generalizability of the symptom and root-cause distributions we report. revision: yes
Referee: [§4] §4 (Classification and Taxonomy): The manual classification into symptoms, 17 root causes, and fix patterns includes no description of inter-rater agreement metrics, resolution of disagreements, or validation steps. This directly undermines the reproducibility of the dominance ordering (Syntax Misuse, Mismatched Data Context, and Incompatible Integration) and the fix-pattern statistics (67.92% template-only fixes).

Authors: We agree that a description of the classification process is essential for reproducibility. In the revision we will augment §4 with a dedicated paragraph detailing the procedure: two authors independently classified the bugs using a taxonomy refined through a pilot study; disagreements were resolved via discussion, with a third author consulted when consensus could not be reached; and we will report inter-rater agreement using Cohen’s kappa for each classification dimension (symptoms, root causes, fix patterns). We will also describe the validation steps performed, including iterative taxonomy refinement and cross-checking on a held-out subset. These changes will support the reported dominance ordering and fix-pattern statistics. revision: yes
Referee: [§5] §5 (Fix Patterns): The finding that over 20% of bugs require host-side logic changes rests entirely on the unvalidated classification; any sampling or labeling bias would render the 67.92% / >20% split non-generalizable.

Authors: We concur that the fix-pattern results in §5 depend directly on the classification described in §4. By incorporating the inter-rater agreement metrics, disagreement-resolution process, and validation steps into the revised §4, we will strengthen the evidential basis for the 67.92% template-only and >20% host-side figures. In the updated §5 we will explicitly cross-reference the enhanced methodological description, provide additional illustrative examples of host-side changes, and discuss limitations on generalizability. This integrated revision will improve the reliability of the reported split. revision: yes

Circularity Check

0 steps flagged

No circularity: purely empirical bug classification with external data grounding

full rationale

The paper performs a manual analysis of 1,004 bugs mined from GitHub issues across 15 template engines. It reports symptom frequencies (e.g., Abnormal Rendering Result at 48.61%), 17 root-cause categories, and fix-pattern statistics without any equations, fitted parameters, predictions, or first-principles derivations. No step reduces a claimed result to its own inputs by construction, self-definition, or self-citation load-bearing. The taxonomy and percentages are produced by direct inspection of external issue reports rather than any internal normalization or uniqueness theorem imported from prior author work. Self-citations, if present, are incidental and not invoked to justify the completeness of the 17 root causes or the representativeness of the sample. The derivation chain is therefore self-contained against the collected bug corpus.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

This is an empirical bug-classification study. It introduces no free parameters, new mathematical axioms, or invented entities. The central claims rest on the unverified assumption that the collected bug set is representative and that the manual categorization is reliable.

axioms (1)

domain assumption The 1,004 bugs collected across 15 template engines form a representative sample of bugs in template engine-based applications.
All prevalence statistics and the claim of comprehensive coverage depend on this assumption.

pith-pipeline@v0.9.0 · 5572 in / 1577 out tokens · 68347 ms · 2026-05-07T05:38:23.028185+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

95 extracted references · 54 canonical work pages · 1 internal anchor

[1]

Mouna Abidi, Md Saidur Rahman, Moses Openja, and Foutse Khomh. 2021. Are Multi-Language Design Smells Fault- Prone? An Empirical Study.ACM Trans. Softw. Eng. Methodol.30, 3, Article 29 (Feb. 2021), 56 pages. doi:10.1145/3432690

work page doi:10.1145/3432690 2021
[2]

Rigo Armin and Fijalkowski Maciej. 2026. CFFI Overview. https://cffi.readthedocs.io/en/stable/overview.html. Accessed: [2026.01.20]

2026
[3]

Mahi Begoug, Narjes Bessghaier, Ali Ouni, Eman Abdullah AlOmar, and Mohamed Wiem Mkaouer. 2023. What Do Infrastructure-as-Code Practitioners Discuss: An Empirical Study on Stack Overflow. In2023 ACM/IEEE International Symposium on Empirical Software Engineering and Measurement (ESEM). IEEE Press, 1–12. doi:10.1109/ESEM56168. 2023.10304847

work page doi:10.1109/esem56168 2023
[4]

Junming Cao, Bihuan Chen, Chao Sun, Longjie Hu, Shuaihong Wu, and Xin Peng. 2022. Understanding performance problems in deep learning systems. InProceedings of the 30th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering(Singapore, Singapore)(ESEC/FSE 2022). Association for Computing Machinery, New Y...

work page doi:10.1145/3540250.3549123 2022
[5]

The Ultimate Configuration Management Tool? Lessons from a Mixed Methods Study of Ansible's Challenges

Carolina Carreira, Nuno Saavedra, Alexandra Mendes, and João F. Ferreira. 2025. The Ultimate Configuration Management Tool? Lessons from a Mixed Methods Study of Ansible’s Challenges. arXiv:2504.08678 [cs.SE] https: //arxiv.org/abs/2504.08678

work page internal anchor Pith review Pith/arXiv arXiv 2025
[6]

Chunyang Chen and Zhenchang Xing. 2016. Towards Correlating Search on Google and Asking on Stack Overflow. In2016 IEEE 40th Annual Computer Software and Applications Conference (COMPSAC), Vol. 1. 83–92. doi:10.1109/ COMPSAC.2016.210

2016
[7]

Junjie Chen, Yihua Liang, Qingchao Shen, Jiajun Jiang, and Shuochuan Li. 2023. Toward Understanding Deep Learning Framework Bugs.ACM Trans. Softw. Eng. Methodol.32, 6, Article 135 (Sept. 2023), 31 pages. doi:10.1145/3587155

work page doi:10.1145/3587155 2023
[8]

Yuntianyi Chen, Yuqi Huai, Yirui He, Shilong Li, Changnam Hong, Qi Alfred Chen, and Joshua Garcia. 2025. A Comprehensive Study of Bug-Fix Patterns in Autonomous Driving Systems.Proc. ACM Softw. Eng.2, FSE, Article FSE018 (June 2025), 23 pages. doi:10.1145/3715733

work page doi:10.1145/3715733 2025
[9]

Zhenpeng Chen, Huihan Yao, Yiling Lou, Yanbin Cao, Yuanqiang Liu, Haoyu Wang, and Xuanzhe Liu. 2021. An Empirical Study on Deployment Faults of Deep Learning Based Mobile Applications. InProceedings of the 43rd International Conference on Software Engineering(Madrid, Spain)(ICSE ’21). IEEE Press, 674–685. doi:10.1109/ICSE43902.2021.00068

work page doi:10.1109/icse43902.2021.00068 2021
[10]

dbt Labs, Inc. 2026. dbt. https://github.com/dbt-labs/dbt-core. Accessed: [2026.01.19]

2026
[11]

Taijara Loiola De Santana, Paulo Anselmo Da Mota Silveira Neto, Eduardo Santana De Almeida, and Iftekhar Ahmed
[12]

Bug Analysis in Jupyter Notebook Projects: An Empirical Study.ACM Trans. Softw. Eng. Methodol.33, 4, Article 101 (April 2024), 34 pages. doi:10.1145/3641539

work page doi:10.1145/3641539 2024
[13]

Levin, Wolfram Schulte, and Milos Gligoric

Cheng Ding, Zhong Xu, Michael Y. Levin, Wolfram Schulte, and Milos Gligoric. 2026. TypeJinja: Static Type Checking of Jinja Templates at dbt Labs. InInternational Conference on Software Engineering, Software Engineering in Practice. Association for Computing Machinery, New York, NY, USA, To appear

2026
[14]

Georgios-Petros Drosos, Thodoris Sotiropoulos, Georgios Alexopoulos, Dimitris Mitropoulos, and Zhendong Su. 2024. When Your Infrastructure Is a Buggy Program: Understanding Faults in Infrastructure as Code Ecosystems.Proc. ACM Program. Lang.8, OOPSLA2, Article 359 (Oct. 2024), 31 pages. doi:10.1145/3689799

work page doi:10.1145/3689799 2024
[15]

Xiaohu Du, Xiao Chen, Jialun Cao, Ming Wen, Shing-Chi Cheung, and Hai Jin. 2023. Understanding the Bug Charac- teristics and Fix Strategies of Federated Learning Systems. InProceedings of the 31st ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering(San Francisco, CA, USA)(ESEC/FSE 2023). Association ...

work page doi:10.1145/3611643.3616347 2023
[16]

Matthew Eernisse. 2026. EJS. https://github.com/mde/ejs. Accessed: [2026-01-21]

2026
[17]

Morven Gentleman

Bo Einarsson and W. Morven Gentleman. 1984. Mixed language programming.Software: Practice and Experience14, 4 (1984), 383–395. arXiv:https://onlinelibrary.wiley.com/doi/pdf/10.1002/spe.4380140410 doi:10.1002/spe.4380140410

work page doi:10.1002/spe.4380140410 1984
[18]

Apache Software Foundation. 2025. FreeMarker. https://github.com/apache/freemarker. Accessed: [2026-01-21]

2025
[19]

Apache Software Foundation. 2025. Velocity. https://github.com/PaperMC/Velocity. Accessed: [2026-01-21]

2025
[20]

Django Software Foundation. 2026. Django. https://www.djangoproject.com/. Accessed: [2026.01.19]

2026
[21]

Django Software Foundation. 2026. Django-Template. https://docs.djangoproject.com/en/6.0/topics/templates/. Ac- cessed: [2026-01-21]

2026
[22]

Python Software Foundation. 2026. Formatted String Literals. https://docs.python.org/3/tutorial/inputoutput.html# formatted-string-literals. Accessed: [2026.01.23]

2026
[23]

The Apache Software Foundation. 2026. Apache Airflow. https://airflow.apache.org/. Accessed: [2026.01.19]

2026
[24]

Joshua Garcia, Yang Feng, Junjie Shen, Sumaya Almanee, Yuan Xia, and Qi Alfred Chen. 2020. A comprehensive study of autonomous vehicle bugs. InProceedings of the ACM/IEEE 42nd International Conference on Software Engineering (Seoul, South Korea)(ICSE ’20). Association for Computing Machinery, New York, NY, USA, 385–396. doi:10.1145/ 3377811.3380397

work page arXiv 2020
[25]

GitHub. 2026. About the dependency graph. https://docs.github.com/en/code-security/concepts/supply-chain-security/ about-the-dependency-graph. Accessed: 2026-04-30

2026
[26]

Eghan, and Bram Adams

Manel Grichi, Mouna Abidi, Fehmi Jaafar, Ellis E. Eghan, and Bram Adams. 2021. On the Impact of Interlanguage Dependencies in Multilanguage Systems Empirical Case Study on Java Native Interface Applications (JNI).IEEE Transactions on Reliability70, 1 (2021), 428–440. doi:10.1109/TR.2020.3024873

work page doi:10.1109/tr.2020.3024873 2021
[27]

New Digital Group. 2026. Smarty. https://github.com/smarty-php/smarty. Accessed: [2026-01-21]

2026
[28]

Hao Guan, Ying Xiao, Jiaying Li, Yepang Liu, and Guangdong Bai. 2023. A Comprehensive Study of Real-World Bugs in Machine Learning Model Optimization. InProceedings of the 45th International Conference on Software Engineering (Melbourne, Victoria, Australia)(ICSE ’23). IEEE Press, 147–158. doi:10.1109/ICSE48619.2023.00024

work page doi:10.1109/icse48619.2023.00024 2023
[29]

Sture Holm. 1979. A Simple Sequentially Rejective Multiple Test Procedure.Scandinavian Journal of Statistics6, 2 (1979), 65–70. http://www.jstor.org/stable/4615733

work page arXiv 1979
[30]

Mingzhe Hu and Yu Zhang. 2023. An empirical study of the Python/C API on evolution and bug patterns.Journal of Software: Evolution and Process35, 2 (2023), e2507. arXiv:https://onlinelibrary.wiley.com/doi/pdf/10.1002/smr.2507 doi:10.1002/smr.2507 , Vol. 1, No. 1, Article . Publication date: May 2018. Understanding Bugs in Template Engine-Based Application...

work page doi:10.1002/smr.2507 2023
[31]

Kaifeng Huang, Bihuan Chen, Susheng Wu, Junming Cao, Lei Ma, and Xin Peng. 2023. Demystifying Dependency Bugs in Deep Learning Stack. InProceedings of the 31st ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering(San Francisco, CA, USA)(ESEC/FSE 2023). Association for Computing Machinery, New York, NY...

work page doi:10.1145/3611643.3616325 2023
[32]

Nargiz Humbatova, Gunel Jahangirova, Gabriele Bavota, Vincenzo Riccio, Andrea Stocco, and Paolo Tonella. 2020. Taxonomy of real faults in deep learning systems. InProceedings of the ACM/IEEE 42nd International Conference on Software Engineering(Seoul, South Korea)(ICSE ’20). Association for Computing Machinery, New York, NY, USA, 1110–1121. doi:10.1145/33...

work page doi:10.1145/3377811.3380395 2020
[33]

Md Johirul Islam, Giang Nguyen, Rangeet Pan, and Hridesh Rajan. 2019. A comprehensive study on deep learning bug characteristics. InProceedings of the 2019 27th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering(Tallinn, Estonia)(ESEC/FSE 2019). Association for Computing Machinery, New Yo...

work page doi:10.1145/3338906.3338955 2019
[34]

Md Johirul Islam, Rangeet Pan, Giang Nguyen, and Hridesh Rajan. 2020. Repairing deep neural networks: fix patterns and challenges. InProceedings of the ACM/IEEE 42nd International Conference on Software Engineering(Seoul, South Korea)(ICSE ’20). Association for Computing Machinery, New York, NY, USA, 1135–1146. doi:10.1145/3377811.3380378

work page doi:10.1145/3377811.3380378 2020
[35]

Yehuda Katz. 2026. Handlebars.js. https://github.com/handlebars-lang/handlebars.js. Accessed: [2026-01-21]

2026
[36]

James Kettle. 2015. Server-Side Template Injection: RCE for the Modern Webapp. https://blackhat.com/docs/us- 15/materials/us-15-Kettle-Server-Side-Template-Injection-RCE-For-The-Modern-Web-App-wp.pdf

2015
[37]

Pavneet Singh Kochhar, Dinusha Wijedasa, and David Lo. 2016. A Large Scale Study of Multiple Programming Lan- guages and Code Quality. In2016 IEEE 23rd International Conference on Software Analysis, Evolution, and Reengineering (SANER), Vol. 1. 563–573. doi:10.1109/SANER.2016.112

work page doi:10.1109/saner.2016.112 2016
[38]

Kruskal and W

William H. Kruskal and W. Allen Wallis. 1952. Use of Ranks in One-Criterion Variance Analysis.J. Amer. Statist. Assoc. 47, 260 (1952), 583–621. arXiv:https://doi.org/10.1080/01621459.1952.10483441 doi:10.1080/01621459.1952.10483441

work page doi:10.1080/01621459.1952.10483441 1952
[39]

Laravel. 2026. Blade. https://laravel.com/docs/12.x/blade. Accessed: [2026-01-21]

2026
[40]

Jan Lehnardt. 2023. mustache.js. https://github.com/janl/mustache.js. Accessed: [2026-01-21]

2023
[41]

Wen Li, Li Li, and Haipeng Cai. 2022. On the vulnerability proneness of multilingual code. InProceedings of the 30th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering (Singapore, Singapore)(ESEC/FSE 2022). Association for Computing Machinery, New York, NY, USA, 847–859. doi:10. 1145/3540250.3549173

work page arXiv 2022
[42]

Zengyang Li, Sicheng Wang, Wenshuo Wang, Peng Liang, Ran Mo, and Bing Li. 2023. Understanding Bugs in Multi- Language Deep Learning Frameworks. In2023 IEEE/ACM 31st International Conference on Program Comprehension (ICPC). 328–338. doi:10.1109/ICPC58990.2023.00047

work page doi:10.1109/icpc58990.2023.00047 2023
[43]

Zengyang Li, Wenshuo Wang, Sicheng Wang, Peng Liang, and Ran Mo. 2023. Understanding Resolution of Multi- Language Bugs: An Empirical Study on Apache Projects. In2023 ACM/IEEE International Symposium on Empirical Software Engineering and Measurement (ESEM). 1–11. doi:10.1109/ESEM56168.2023.10304793

work page doi:10.1109/esem56168.2023.10304793 2023
[44]

Yepang Liu, Chang Xu, and Shing-Chi Cheung. 2014. Characterizing and detecting performance bugs for smartphone applications. InProceedings of the 36th International Conference on Software Engineering(Hyderabad, India)(ICSE 2014). Association for Computing Machinery, New York, NY, USA, 1013–1024. doi:10.1145/2568225.2568229

work page doi:10.1145/2568225.2568229 2014
[45]

Weiqi Lu, Yongqiang Tian, Xiaohan Zhong, Haoyang Ma, Zhenyang Xu, Shing-Chi Cheung, and Chengnian Sun. 2025. An Empirical Study of Bugs in Data Visualization Libraries.Proc. ACM Softw. Eng.2, FSE, Article FSE093 (June 2025), 24 pages. doi:10.1145/3729363

work page doi:10.1145/3729363 2025
[46]

H. B. Mann and D. R. Whitney. 1947. On a Test of Whether one of Two Random Variables is Stochastically Larger than the Other.The Annals of Mathematical Statistics18, 1 (1947), 50–60. http://www.jstor.org/stable/2236101

work page arXiv 1947
[47]

Philip Mayer and Alexander Bauer. 2015. An empirical analysis of the utilization of multiple programming languages in open source projects. InProceedings of the 19th International Conference on Evaluation and Assessment in Software Engineering(Nanjing, China)(EASE ’15). Association for Computing Machinery, New York, NY, USA, Article 4, 10 pages. doi:10.11...

work page doi:10.1145/2745802.2745805 2015
[48]

Philip Mayer, Michael Kirsch, and Minh Anh Le. 2017. On multi-language software development, cross-language links and accompanying tools: a survey of professional software developers.Journal of Software Engineering Research and Development5, 1 (19 Apr 2017), 1. doi:10.1186/s40411-017-0035-z

work page doi:10.1186/s40411-017-0035-z 2017
[49]

Oracle. 2025. Java Native Interface Specification. https://docs.oracle.com/en/java/javase/25/docs/specs/jni/intro.html. Accessed: [2026.01.20]

2025
[50]

Pallets. 2026. How fast is Jinja? https://jinja.palletsprojects.com/en/stable/faq/#how-fast-is-jinja. Accessed: [2026.01.23]

2026
[51]

Pallets. 2026. Jinja. https://github.com/pallets/jinja. Accessed: [2026-01-21]

2026
[52]

Terence John Parr. 2004. Enforcing strict model-view separation in template engines. InProceedings of the 13th International Conference on World Wide Web(New York, NY, USA)(WWW ’04). Association for Computing Machinery, , Vol. 1, No. 1, Article . Publication date: May 2018. 28 Kai Gao, Yu Sun, and Chang-ai Sun New York, NY, USA, 224–233. doi:10.1145/988672.988703

work page doi:10.1145/988672.988703 2004
[53]

Lorenzo Pisu, Davide Maiorca, and Giorgio Giacinto. 2024. A Survey of the Overlooked Dangers of Template Engines. arXiv:2405.01118 [cs.CR] https://arxiv.org/abs/2405.01118

work page arXiv 2024
[54]

Pug. 2024. Pug. https://github.com/pugjs/pug. Accessed: [2026-01-21]

2024
[55]

PyPI. 2025. PyPI Architectural Overview. https://warehouse.pypa.io/application/. Accessed: [2026.01.19]

2025
[56]

Ravishka Rathnasuriya, Nidhi Majoju, Zihe Song, and Wei Yang. 2025. An Investigation on Numerical Bugs in GPU Programs Towards Automated Bug Detection.Proc. ACM Softw. Eng.2, ISSTA, Article ISSTA073 (June 2025), 24 pages. doi:10.1145/3728950

work page doi:10.1145/3728950 2025
[57]

RedHat, Inc. 2026. Ansible. https://github.com/ansible/ansible. Accessed: [2026.01.19]

2026
[58]

ROCK-SE. 2026. template-engine-bugs. https://github.com/ROCK-SE/template-engine-bugs. Accessed: 2026-04-29

2026
[59]

Ruby. 2026. ERB. https://github.com/ruby/erb. Accessed: [2026-01-21]

2026
[60]

Carolyn B. Seaman. 1999. Qualitative Methods in Empirical Studies of Software Engineering.IEEE Trans. Softw. Eng. 25, 4 (July 1999), 557–572. doi:10.1109/32.799955

work page doi:10.1109/32.799955 1999
[61]

Shrestha Shailabh, Stefano Balocco, and Marat Reimers. 2026. Awesome Template Engine: A curated list of template engines across all programming languages with repository activity stats. https://github.com/sshailabh/awesome- template-engine. Accessed: [2026-01-21]

2026
[62]

Qingchao Shen, Haoyang Ma, Junjie Chen, Yongqiang Tian, Shing-Chi Cheung, and Xiang Chen. 2021. A comprehensive study of deep learning compiler bugs. InProceedings of the 29th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering(Athens, Greece)(ESEC/FSE 2021). Association for Computing Mach...

work page doi:10.1145/3468264.3468591 2021
[63]

Shopify. 2026. Liquid. https://github.com/Shopify/liquid. Accessed: [2026-01-21]

2026
[64]

Jiayin Song, Yike Li, Yunzhe Tian, Haoxuan Ma, Honglei Li, Jie Zuo, Jiqiang Liu, and Wenjia Niu. 2025. Investigating the bugs in reinforcement learning programs: Insights from Stack Overflow and GitHub.Automated Software Engineering 33, 1 (23 Sep 2025), 9. doi:10.1007/s10515-025-00555-z

work page doi:10.1007/s10515-025-00555-z 2025
[65]

Nawrin Sultana, Justin Middleton, Jeffrey Overbey, and Munawar Hafiz. 2016. Understanding and fixing multiple language interoperability issues: the C/Fortran case. InProceedings of the 38th International Conference on Software Engineering(Austin, Texas)(ICSE ’16). Association for Computing Machinery, New York, NY, USA, 772–783. doi:10. 1145/2884781.2884858

work page arXiv 2016
[66]

Symfony. 2026. Twig. https://github.com/twigphp/Twig. Accessed: [2026-01-21]

2026
[67]

VMware Tanzu. 2026. Spring. https://spring.io/. Accessed: [2026.01.19]

2026
[68]

Haml Team. 2026. Haml. https://github.com/haml/haml. Accessed: [2026-01-21]

2026
[69]

Thymeleaf. 2025. Thymeleaf. https://github.com/thymeleaf/thymeleaf. Accessed: [2026-01-21]

2025
[70]

Federico Tomassetti and Marco Torchiano. 2014. An empirical assessment of polyglot-ism in GitHub. InProceedings of the 18th International Conference on Evaluation and Assessment in Software Engineering(London, England, United Kingdom) (EASE ’14). Association for Computing Machinery, New York, NY, USA, Article 17, 4 pages. doi:10.1145/2601248.2601269

work page doi:10.1145/2601248.2601269 2014
[71]

Christoph Treude, Ohad Barzilay, and Margaret-Anne Storey. 2011. How do programmers ask and answer questions on the web?: NIER track. In2011 33rd International Conference on Software Engineering (ICSE). 804–807. doi:10.1145/ 1985793.1985907

work page arXiv 2011
[72]

Anthony J Viera and Joanne M Garrett. 2005. Understanding interobserver agreement: the kappa statistic.Family medicine37, 5 (May 2005), 360—363. http://europepmc.org/abstract/MED/15883903

work page arXiv 2005
[73]

Bo Wang, Chong Chen, Junjie Chen, Bowen Xu, Chen Ye, Youfang Lin, Guoliang Dong, and Jun Sun. 2025. A Comprehensive Study of OOP-Related Bugs in C++ Compilers.IEEE Transactions on Software Engineering51, 6 (2025), 1762–1782. doi:10.1109/TSE.2025.3566490

work page doi:10.1109/tse.2025.3566490 2025
[74]

Wikipedia. 2025. Convention over configuration. https://en.wikipedia.org/wiki/Convention_over_configuration). Accessed: [2026.04.28]

2025
[75]

Wikipedia. 2025. Foreign function interface. https://en.wikipedia.org/wiki/Foreign_function_interface. Accessed: [2026.01.22]

2025
[76]

Wikipedia. 2025. Polyglot (computing). https://en.wikipedia.org/wiki/Polyglot_(computing). Accessed: [2026.01.20]

2025
[77]

Wikipedia. 2026. Scaffold (programming). https://en.wikipedia.org/wiki/Scaffold_(programming). Accessed: [2026.04.26]

2026
[78]

Wikipedia. 2026. Snippet (programming). https://en.wikipedia.org/wiki/Snippet_(programming). Accessed: [2026.04.26]

2026
[79]

Wikipedia. 2026. Template processor. https://en.wikipedia.org/wiki/Template_processor. Accessed: [2026.01.19]

2026
[80]

Jianyu Wu, Hao He, Wenxin Xiao, Kai Gao, and Minghui Zhou. 2022. Demystifying software release note issues on GitHub. InProceedings of the 30th IEEE/ACM International Conference on Program Comprehension(Virtual Event)(ICPC ’22). Association for Computing Machinery, New York, NY, USA, 602–613. doi:10.1145/3524610.3527919 , Vol. 1, No. 1, Article . Publicat...

work page doi:10.1145/3524610.3527919 2022

Showing first 80 references.