arxiv: 2604.10126 · v2 · submitted 2026-04-11 · 💻 cs.SE · cs.AI

Recognition: unknown

MR-Coupler: Automated Metamorphic Test Generation via Functional Coupling Analysis

Congying Xu , Hengcheng Zhu , Songqiang Chen , Jiarong Wu , Valerio Terragni , Shing-Chi Cheung

Authors on Pith no claims yet

Pith reviewed 2026-05-10 15:50 UTC · model grok-4.3

classification 💻 cs.SE cs.AI

keywords metamorphic testingmetamorphic relationsfunctional couplingautomated test generationlarge language modelssoftware testingoracle problemmutation analysis

0 comments

The pith

Functional coupling between methods in source code lets large language models automatically generate valid metamorphic test cases.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces MR-Coupler to overcome the main obstacle in metamorphic testing: the manual effort required to define metamorphic relations that link different inputs and outputs. It does so by scanning source code for pairs of methods that exhibit functional coupling, using three readily computed features to select the most promising pairs without exhaustive search. Large language models are then prompted to turn those pairs into candidate metamorphic test cases, which are filtered and strengthened through test amplification and mutation analysis to cut false positives. On a benchmark of 100 human-written metamorphic test cases the method produces valid outputs for more than 90 percent of the tasks, while on 50 real-world bugs the generated cases detect 44 percent of the faults and reduce false alarms relative to prior automated approaches. If the technique works as described, metamorphic testing could move from a niche expert activity to a routine part of ordinary test suites.

Core claim

MR-Coupler identifies functionally coupled method pairs via three coupling features, prompts large language models to instantiate metamorphic relations for those pairs, and validates the resulting metamorphic test cases with test amplification and mutation analysis, yielding valid cases for over 90 percent of evaluated tasks and detecting 44 percent of real-world bugs.

What carries the argument

MR-Coupler, the pipeline that selects method pairs by functional coupling features, delegates relation instantiation to large language models, and applies test amplification plus mutation analysis for validation.

Load-bearing premise

The three chosen features of functional coupling between methods reliably indicate pairs that possess useful metamorphic relations an LLM can formulate correctly.

What would settle it

Applying MR-Coupler to a fresh collection of 50 industrial programs and finding that the generated metamorphic test cases detect fewer than 25 percent of the injected or reported bugs would falsify the reported effectiveness.

Figures

Figures reproduced from arXiv: 2604.10126 by Congying Xu, Hengcheng Zhu, Jiarong Wu, Shing-Chi Cheung, Songqiang Chen, Valerio Terragni.

**Figure 1.** Figure 1: An overview of MR-Coupler coupling between methods as the basis for formulating MRs, and by automatically generating valid MTCs that can be applied to diverse inputs to enhance test adequacy. 3 Approach: MR-Coupler In this section, we present MR-Coupler, an automated MTC generator based on functional coupling [PITH_FULL_IMAGE:figures/full_fig_p005_1.png] view at source ↗

read the original abstract

Metamorphic testing (MT) is a widely recognized technique for alleviating the oracle problem in software testing. However, its adoption is hindered by the difficulty of constructing effective metamorphic relations (MRs), which often require domain-specific or hard-to-obtain knowledge. In this work, we propose a novel approach that leverages the functional coupling between methods, which is readily available in source code, to automatically construct MRs and generate metamorphic test cases (MTCs). Our technique, MR-Coupler, identifies functionally coupled method pairs, employs large language models to generate candidate MTCs, and validates them through test amplification and mutation analysis. In particular, we leverage three functional coupling features to avoid expensive enumeration of possible method pairs, and a novel validation mechanism to reduce false alarms. Our evaluation of MR-Coupler on 100 human-written MTCs and 50 real-world bugs shows that it generates valid MTCs for over 90% of tasks, improves valid MTC generation by 64.90%, and reduces false alarms by 36.56% compared to baselines. Furthermore, the MTCs generated by MR-Coupler detect 44% of the real bugs. Our results highlight the effectiveness of leveraging functional coupling for automated MR construction and the potential of MR-Coupler to facilitate the adoption of MT in practice. We also released the tool and experimental data to support future research.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

MR-Coupler shows a workable pipeline that prunes method pairs with three code-derived coupling features, uses an LLM for candidate MTCs, and validates via amplification plus mutation analysis, delivering over 90% valid cases and 44% bug detection on the reported sets.

read the letter

The paper's core contribution is a concrete pipeline for metamorphic test generation that starts with static analysis to identify functionally coupled method pairs, feeds the reduced set to an LLM for relation and test case proposals, and then applies a dedicated amplification-plus-mutation step to filter invalid outputs. This combination is presented as new relative to the cited metamorphic testing work, and the evaluation backs it with numbers on 100 human-written MTCs plus 50 real bugs: valid MTCs above 90 percent, 64.9 percent improvement over baselines, 36.56 percent fewer false alarms, and detection of 44 percent of the faults. They also ship the tool and data, which is useful for anyone wanting to reproduce or extend the work. The pruning via the three coupling features looks like a reasonable engineering choice that avoids brute-force enumeration without adding heavy machinery, and the validation layer directly targets the false-alarm problem that often plagues LLM-generated tests. The use of external mutation analysis and real faults rather than self-defined metrics keeps the claims from being circular. One soft spot is that the reported gains still rest on LLM behavior, which can shift with prompt wording or model version, so the 64.9 percent lift might not hold identically across runs even with the released artifacts. The 50 bugs come from a limited set of projects, so claims about broader practical impact are plausible but not yet stress-tested at scale. This is aimed at software testing researchers and tool builders who already know metamorphic testing but need help making it automatic. A reader working on automated oracle-free testing would find the feature choices and validation details worth examining. It deserves a serious referee because the approach is internally consistent, the evaluation uses external oracles where possible, and the artifacts lower the barrier to checking the numbers.

Referee Report

0 major / 2 minor

Summary. The paper introduces MR-Coupler, a technique for automated metamorphic test case (MTC) generation. It identifies functionally coupled method pairs using three code-derived features to avoid exhaustive enumeration, employs large language models to generate candidate MTCs based on metamorphic relations, and validates candidates via test amplification combined with mutation analysis to reduce false alarms. Evaluation is performed on 100 human-written MTCs and 50 real-world bugs, reporting >90% valid MTC generation, a 64.90% improvement in valid MTC generation over baselines, a 36.56% reduction in false alarms, and detection of 44% of the real bugs.

Significance. If the results hold, the work addresses a longstanding barrier to metamorphic testing adoption by automating MR construction from readily available source-code features rather than domain expertise. The pipeline integrates static analysis, LLM generation, and dynamic validation in a manner that appears internally consistent and externally validated via mutation analysis and real faults. The explicit release of the tool and experimental data is a clear strength that supports reproducibility and follow-on research.

minor comments (2)

The abstract and evaluation summary report concrete percentages (e.g., 64.90% improvement, 36.56% false-alarm reduction) but do not name the exact baseline techniques or statistical tests used; adding this detail would strengthen the comparison claims.
The three functional coupling features are central to narrowing the search space, yet the manuscript would benefit from a brief justification or reference to prior work on why these particular features (rather than alternatives) were selected.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for the positive summary of our work on MR-Coupler and the recommendation for minor revision. We are encouraged that the approach's potential to address longstanding challenges in metamorphic testing adoption through functional coupling analysis is recognized, along with the strengths in reproducibility via tool and data release.

Circularity Check

0 steps flagged

No significant circularity identified

full rationale

The paper describes an empirical tool (MR-Coupler) that identifies functionally coupled method pairs via three code-derived features, uses LLMs to propose MTCs, and validates them via test amplification plus mutation analysis. All reported performance numbers (90% valid MTCs, 64.90% improvement, 36.56% false-alarm reduction, 44% bug detection) are obtained by direct measurement against external artifacts: 100 human-written MTCs and 50 real-world bugs. No equations, predictions, or uniqueness claims reduce by construction to quantities defined inside the paper; the validation pipeline is explicitly external and falsifiable. No self-citation chains or ansatzes are load-bearing for the central result.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on one domain assumption about functional coupling and contains no fitted free parameters or newly invented entities.

axioms (1)

domain assumption Functional coupling between methods, detectable from source code, indicates the existence of useful metamorphic relations.
This premise is invoked to justify why the three coupling features can be used to select candidate method pairs without exhaustive search.

pith-pipeline@v0.9.0 · 5564 in / 1382 out tokens · 36695 ms · 2026-05-10T15:50:27.079748+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

78 extracted references · 28 canonical work pages

[1]

2025.Qwen3-coder

Alibaba. 2025.Qwen3-coder. Retrieved September 1, 2025 from https://qwenlm.github.io/blog/qwen3-coder/

2025
[2]

Sahraoui

Simon Allier, Stéphane Vaucher, Bruno Dufour, and Houari A. Sahraoui. 2010. Deriving Coupling Metrics from Call Graphs. InTenth IEEE International Working Conference on Source Code Analysis and Manipulation, SCAM 2010, Timisoara, Romania, 12-13 September 2010. IEEE Computer Society, 43–52. https://doi.org/10.1109/SCAM.2010.25

work page doi:10.1109/scam.2010.25 2010
[3]

Juan Altmayer Pizzorno and Emery D. Berger. 2025. CoverUp: Effective High Coverage Test Generation for Python. Proc. ACM Softw. Eng.2, FSE, Article FSE128 (June 2025), 23 pages. https://doi.org/10.1145/3729398

work page doi:10.1145/3729398 2025
[4]

Jon Ayerdi, Valerio Terragni, Aitor Arrieta, Paolo Tonella, Goiuria Sagardui, and Maite Arratibel. 2021. Generating metamorphic relations for cyber-physical systems with genetic programming: an industrial case study. InJoint European Software Engineering Conference and Symposium on the Foundations of Software Engineering. ACM, 1264–1274

2021
[5]

Jon Ayerdi, Valerio Terragni, Gunel Jahangirova, Aitor Arrieta, and Paolo Tonella. 2024. GenMorph: Automatically Generating Metamorphic Relations via Genetic Programming.IEEE Transactions on Software Engineering(2024), 1–12

2024
[6]

Ernst, Mauro Pezzè, and Antonio Carzaniga

Arianna Blasi, Alessandra Gorla, Michael D. Ernst, Mauro Pezzè, and Antonio Carzaniga. 2021. MeMo: Automatically identifying metamorphic relations in Javadoc comments for test automation.J. Syst. Softw.181 (2021), 111041. https://doi.org/10.1016/J.JSS.2021.111041

work page doi:10.1016/j.jss.2021.111041 2021
[7]

Adam Bodicoat, Gunel Jahangirova, and Valerio Terragni. 2025. Understanding LLM-Driven Test Oracle Generation. In2025 2nd IEEE/ACM International Conference on AI-powered Software (AIware). IEEE, 29–39

2025
[8]

Jialun Cao, Meiziniu Li, Yeting Li, Ming Wen, Shing-Chi Cheung, and Haiming Chen. 2022. SemMT: A Semantic-Based Testing Approach for Machine Translation Systems.ACM Transactions on Software Engineering and Methodology31, 2 (2022), 34e:1–34e:36

2022
[9]

Jialun Cao, Wuqi Zhang, and Shing-Chi Cheung. 2024. Concerned with Data Contamination? Assessing Countermea- sures in Code Language Model.CoRRabs/2403.16898 (2024). arXiv:2403.16898

work page arXiv 2024
[10]

Songqiang Chen, Shuo Jin, and Xiaoyuan Xie. 2021. Testing Your Question Answering Software via Asking Recursively. InInternational Conference on Automated Software Engineering. IEEE, 104–116

2021
[11]

Tsong Yueh Chen, Fei-Ching Kuo, Huai Liu, Pak-Lok Poon, Dave Towey, T. H. Tse, and Zhi Quan Zhou. 2018. Metamorphic Testing: A Review of Challenges and Opportunities.ACM Comput. Surv.51, 1 (2018), 4:1–4:27. https: //doi.org/10.1145/3143561

work page doi:10.1145/3143561 2018
[12]

Tsong Yueh Chen, Pak-Lok Poon, and Xiaoyuan Xie. 2016. METRIC: METamorphic Relation Identification based on the Category-choice framework.J. Syst. Softw.116 (2016), 177–190. https://doi.org/10.1016/j.jss.2015.07.037

work page doi:10.1016/j.jss.2015.07.037 2016
[13]

Yinghao Chen, Zehao Hu, Chen Zhi, Junxiao Han, Shuiguang Deng, and Jianwei Yin. 2024. ChatUniTest: A Framework for LLM-Based Test Generation. InCompanion Proceedings of the 32nd ACM International Conference on the Foundations of Software Engineering. 572–576

2024
[14]

Steven Cho, Stefano Ruberto, and Valerio Terragni. 2025. LLMORPH: Automated Metamorphic Testing of Large Language Models. InProceedings of the 40th IEEE/ACM International Conference on Automated Software Engineering. 4102–4105. https://doi.org/10.1109/ASE63991.2025.00385

work page doi:10.1109/ase63991.2025.00385 2025
[15]

Steven Cho, Stefano Ruberto, and Valerio Terragni. 2025. Metamorphic Testing of Large Language Models for Natural Language Processing. InProceedings of the 41st IEEE International Conference on Software Maintenance and Evolution (ICSME). IEEE, 174–186. https://doi.org/10.1109/ICSME64153.2025.00025

work page doi:10.1109/icsme64153.2025.00025 2025
[16]

2025.DeepSeek-V3.1

DeepSeek. 2025.DeepSeek-V3.1. Retrieved September 1, 2025 from https://api-docs.deepseek.com/news/news250821 Proc. ACM Softw. Eng., Vol. 3, No. FSE, Article FSE206. Publication date: July 2026. FSE206:22 Congying Xu, Hengcheng Zhu, Songqiang Chen, Jiarong Wu, Valerio Terragni, and Shing-Chi Cheung

2025
[17]

2025.SimplerPlannerTest

Diennea. 2025.SimplerPlannerTest. https://github.com/diennea/herddb/blob/master/herddb-core/src/test/java/herddb/ sql/SimplerPlannerTest.java

2025
[18]

Xueying Du, Mingwei Liu, Kaixin Wang, Hanlin Wang, Junwei Liu, Yixuan Chen, Jiayi Feng, Chaofeng Sha, Xin Peng, and Yiling Lou. 2024. Evaluating Large Language Models in Class-Level Code Generation. Innternational Conference on Software Engineering. ACM, 81:1–81:13

2024
[19]

Aryaz Eghbali and Michael Pradel. 2024. De-Hallucinator: Iterative Grounding for LLM-Based Code Completion.CoRR abs/2401.01701 (2024). arXiv:2401.01701

work page arXiv 2024
[20]

2025.leaderboard

Evalplus. 2025.leaderboard. Retrieved September 1, 2025 from https://evalplus.github.io/leaderboard.html

2025
[21]

2025.BasicParserFilteringTest

FasterXML. 2025.BasicParserFilteringTest. https://github.com/FasterXML/jackson-core/blob/3.x/src/test/java/tools/ jackson/core/unittest/filter/BasicParserFilteringTest.java#L432

2025
[22]

Enrico Fregnan, Tobias Baum, Fabio Palomba, and Alberto Bacchelli. 2019. A survey on software coupling relations and tools.Inf. Softw. Technol.107 (2019), 159–178. https://doi.org/10.1016/J.INFSOF.2018.11.008

work page doi:10.1016/j.infsof.2018.11.008 2019
[23]

Christoph Hazott and Daniel Große. 2025. LLM-assisted Metamorphic Testing of Embedded Graphics Libraries. In Forum on Specification and Design Languages. https://ics.jku.at/files/2025FDL_LLM-assisted_Metamorphic_Testing_ of_Embedded_Graphics_Libraries.pdf

2025
[24]

Dwyer, Sebastian Elbaum, and Willem Visser

Soneya Binta Hossain, Antonio Filieri, Matthew B. Dwyer, Sebastian G. Elbaum, and Willem Visser. 2023. Neural- Based Test Oracle Generation: A Large-Scale Evaluation and Lessons Learned. InProceedings of the 31st ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering, ESEC/FSE 2023, San Francisco, CA, U...

work page doi:10.1145/3611643.3616265 2023
[25]

2025.IoTDB Issue #13691

Apache IoTDB. 2025.IoTDB Issue #13691. https://github.com/apache/iotdb/pull/13691

2025
[26]

2024.JavaParser

JavaParser. 2024.JavaParser. Retrieved June 6, 2024 from https://javaparser.org/

2024
[27]

2025.GitHub Commit 777a078913

Jcabi. 2025.GitHub Commit 777a078913. https://github.com/jcabi/jcabi-github/commit/777a078913

2025
[28]

Yu Jiang, Jie Liang, Fuchen Ma, Yuanliang Chen, Chijin Zhou, Yuheng Shen, Zhiyong Wu, Jingzhou Fu, Mingzhe Wang, Shanshan Li, et al. 2024. When fuzzing meets llms: Challenges and opportunities. InCompanion Proceedings of the 32nd ACM International Conference on the Foundations of Software Engineering. 492–496

2024
[29]

Knowledge Cutoff Information of GPT-4o-mini [n. d.]. https://community.openai.com/t/introducing-gpt-4o-mini-in- the-api/871594
[30]

Vu Le, Mehrdad Afshari, and Zhendong Su. 2014. Compiler validation via equivalence modulo inputs. InConference on Programming Language Design and Implementation. ACM, 216–226

2014
[31]

Lahiri, and Siddhartha Sen

Caroline Lemieux, Jeevana Priya Inala, Shuvendu K. Lahiri, and Siddhartha Sen. 2023. CodaMosa: Escaping Cover- age Plateaus in Test Generation with Pre-trained Large Language Models. InInternational Conference on Software Engineering. IEEE, 919–931

2023
[32]

Chengshu Li, Jacky Liang, Andy Zeng, Xinyun Chen, Karol Hausman, Dorsa Sadigh, Sergey Levine, Li Fei-Fei, Fei Xia, and Brian Ichter. 2023. Chain of Code: Reasoning with a Language Model-Augmented Code Emulator.CoRR abs/2312.04474 (2023). arXiv:2312.04474

work page arXiv 2023
[33]

Jiapeng Li, Zheng Zheng, Yuning Xing, Daixu Ren, Steven Cho, and Valerio Terragni. 2025. MDPMORPH: An MDP- Based Metamorphic Testing Framework for Deep Reinforcement Learning Agents. InProceedings of the 36th IEEE International Symposium on Software Reliability Engineering. 154–166. https://doi.org/10.1109/ISSRE66568.2025.00028

work page doi:10.1109/issre66568.2025.00028 2025
[34]

Jiapeng Li, Zheng Zheng, Yuning Xing, Daixu Ren, Steven Cho, and Valerio Terragni. 2025. Metamorphic Testing of Deep Reinforcement Learning Agents with MDPMORPH. InProceedings of the 40th IEEE/ACM International Conference on Automated Software Engineering. 4086–4089. https://doi.org/10.1109/ASE63991.2025.00381

work page doi:10.1109/ase63991.2025.00381 2025
[35]

Mankowitz, Esme Sutherland Robson, Pushmeet Kohli, Nando de Freitas, Koray Kavukcuoglu, and Oriol Vinyals

Yujia Li, David H. Choi, Junyoung Chung, Nate Kushman, Julian Schrittwieser, and et al. 2022. Competition-Level Code Generation with AlphaCode.CoRRabs/2203.07814 (2022). arXiv:2203.07814

work page arXiv 2022
[36]

Huai Liu, Fei-Ching Kuo, Dave Towey, and Tsong Yueh Chen. 2014. How Effectively Does Metamorphic Testing Alleviate the Oracle Problem?IEEE Transactions on Software Engineering40, 1 (2014), 4–22

2014
[37]

2025.NTV2Test

LocationTech. 2025.NTV2Test. https://github.com/locationtech/proj4j/blob/master/core/src/test/java/org/locationtech/ proj4j/datum/NTV2Test.java

2025
[38]

Quang-Hung Luu, Huai Liu, and Tsong Yueh Chen. 2023. Can ChatGPT Advance Software Testing Intelligence? An Experience Report on Metamorphic Testing.CoRRabs/2310.19204 (2023). arXiv:2310.19204 https://arxiv.org/abs/2310. 19204

work page arXiv 2023
[39]

Haoyang Ma, Qingchao Shen, Yongqiang Tian, Junjie Chen, and Shing-Chi Cheung. 2023. Fuzzing Deep Learning Compilers with HirGen. InInternational Symposium on Software Testing and Analysis. ACM, 248–260

2023
[40]

2025.Mutation Testing

Major. 2025.Mutation Testing. https://mutation-testing.org/

2025
[41]

Agustín Nolasco, Facundo Molina, Renzo Degiovanni, Alessandra Gorla, Diego Garbervetsky, Mike Papadakis, Sebastián Uchitel, Nazareno Aguirre, and Marcelo F. Frias. 2024. Abstraction-Aware Inference of Metamorphic Relations. Proceedings of the ACM on Software Engineering1, FSE (2024), 450–472. Proc. ACM Softw. Eng., Vol. 3, No. FSE, Article FSE206. Publica...

2024
[42]

2025.GPT-4o mini

OpenAI. 2025.GPT-4o mini. Retrieved September 1, 2025 from https://platform.openai.com/docs/models/gpt-4o-mini

2025
[43]

2025.OjAlgo Issue #49

Optimatika. 2025.OjAlgo Issue #49. https://github.com/optimatika/ojAlgo/issues/49

2025
[44]

2025.OjAlgo Issue #49

Optimatika. 2025.OjAlgo Issue #49. Retrieved September 1, 2025 from https://github.com/optimatika/ojAlgo/issues/49

2025
[45]

Denys Poshyvanyk, Andrian Marcus, Rudolf Ferenc, and Tibor Gyimóthy. 2009. Using information retrieval based coupling measures for impact analysis.Empir. Softw. Eng.14, 1 (2009), 5–32. https://doi.org/10.1007/S10664-008-9088-2

work page doi:10.1007/s10664-008-9088-2 2009
[46]

Ravin Ravi, Dylan Bradshaw, Stefano Ruberto, Gunel Jahangirova, and Valerio Terragni. 2025. LLMLOOP: Improving LLM-Generated Code and Tests through Automated Iterative Feedback Loops. In2025 IEEE International Conference on Software Maintenance and Evolution (ICSME). IEEE, 930–934

2025
[47]

Max Schäfer, Sarah Nadi, Aryaz Eghbali, and Frank Tip. 2023. An empirical evaluation of using large language models for automated unit test generation.IEEE Transactions on Software Engineering50, 1 (2023), 85–105

2023
[48]

Sergio Segura, Gordon Fraser, Ana Belén Sánchez, and Antonio Ruiz Cortés. 2016. A Survey on Metamorphic Testing. IEEE Trans. Software Eng.42, 9 (2016), 805–824. https://doi.org/10.1109/TSE.2016.2532875

work page doi:10.1109/tse.2016.2532875 2016
[49]

Sergio Segura, José Antonio Parejo, Javier Troya, and Antonio Ruiz Cortés. 2018. Metamorphic Testing of RESTful Web APIs.IEEE Transactions on Software Engineering44, 11 (2018), 1083–1099

2018
[50]

Seung Yeob Shin, Fabrizio Pastore, Domenico Bianculli, and Alexandra Baicoianu. 2024. Towards Generating Executable Metamorphic Relations Using Large Language Models. InQuality of Information and Communications Technology - 17th International Conference on the Quality of Information and Communications Technology, QUATIC 2024, Pisa, Italy, September 11-13,...

work page doi:10.1007/978-3-031-70245- 2024
[51]

Chengnian Sun, Vu Le, and Zhendong Su. 2016. Finding compiler bugs via live code mutation. InInternational Conference on Object-Oriented Programming, Systems, Languages, and Applications,. ACM, 849–863

2016
[52]

Chang-Ai Sun, Yiqiang Liu, Zuoyi Wang, and W. K. Chan. 2016. 𝜇MT: a data mutation directed metamorphic relation acquisition methodology. InInternational Workshop on Metamorphic Testing. ACM, 12–18

2016
[53]

Yutian Tang, Zhijie Liu, Zhichao Zhou, and Xiapu Luo. 2024. ChatGPT vs SBST: A Comparative Assessment of Unit Test Suite Generation.IEEE Transactions on Software Engineering(2024), 1–19

2024
[54]

Valerio Terragni, Gunel Jahangirova, Paolo Tonella, and Mauro Pezzè. 2020. Evolutionary Improvement of Assertion Oracles. InJoint European Software Engineering Conference and Symposium on the Foundations of Software Engineering. 1178–1189

2020
[55]

Valerio Terragni, Annie Vella, Partha Roop, and Kelly Blincoe. 2025. The Future of AI-Driven Software Engineering. ACM Trans. Softw. Eng. Methodol.34, 5 (Jan. 2025). https://doi.org/10.1145/3715003

work page doi:10.1145/3715003 2025
[56]

2025.AESEncryptionTest

TheAlgorithms. 2025.AESEncryptionTest. https://github.com/TheAlgorithms/Java/blob/master/src/test/java/com/ thealgorithms/ciphers/AESEncryptionTest.java [57]MR-Coupler. 2025.MR-Couplerwebsite. Retrieved September 2, 2025 from https://mr-coupler.github.io/ [58]MR-Coupler. 2026.MR-Coupleron Zenodo. Retrieved April 2, 2026 from https://doi.org/10.5281/zenodo...

work page doi:10.5281/zenodo.19438045 2025
[57]

Christos Tsigkanos, Pooja Rani, Sebastian Müller, and Timo Kehrer. 2023. Variable Discovery with Large Language Models for Metamorphic Testing of Scientific Software. InComputational Science - ICCS 2023 - 23rd International Conference, Prague, Czech Republic, July 3-5, 2023, Proceedings, Part I (Lecture Notes in Computer Science, Vol. 14073). Springer, 321–335

2023
[58]

2025.dubbo

vmgama. 2025.dubbo. https://github.com/vmgama/dubbo/blob/master/dubbo-common/src/main/java/org/apache/ dubbo/common/io/Bytes.java

2025
[59]

Ying Wang, Bihuan Chen, Kaifeng Huang, Bowen Shi, Congying Xu, Xin Peng, Yijian Wu, and Yang Liu. 2020. An Empirical Study of Usages, Updates and Risks of Third-Party Libraries in Java Projects. InInternational Conference on Software Maintenance and Evolution. IEEE, 35–45

2020
[60]

2025.SparseBitSet Issue #13

Brett Wooldridge. 2025.SparseBitSet Issue #13. https://github.com/brettwooldridge/SparseBitSet/issues/13

2025
[61]

Chunqiu Steven Xia, Matteo Paltenghi, Jia Le Tian, Michael Pradel, and Lingming Zhang. 2024. Fuzz4All: Universal Fuzzing with Large Language Models. InInternational Conference on Software Engineering. ACM, 126:1–126:13

2024
[62]

Xiaoyuan Xie, Shuo Jin, and Songqiang Chen. 2023. qaAskeR+: a novel testing method for question answering software via asking recursive questions.Automated Software Engineering30, 1 (2023), 14

2023
[63]

Xiaoyuan Xie, Shuo Jin, Songqiang Chen, and Shing-Chi Cheung. 2024. Word Closure-Based Metamorphic Testing for Machine Translation.ACM Transactions on Software Engineering and Methodology(jul 2024)

2024
[64]

Congying Xu, Songqiang Chen, Jiarong Wu, Shing-Chi Cheung, Valerio Terragni, Hengcheng Zhu, and Jialun Cao
[65]

MR-Adopt: Automatic Deduction of Input Transformation Function for Metamorphic Testing. InProceedings of the 39th IEEE/ACM International Conference on Automated Software Engineering, ASE 2024, Sacramento, CA, USA, October 27 - November 1, 2024, Vladimir Filkov, Baishakhi Ray, and Minghui Zhou (Eds.). ACM, 557–569. https: //doi.org/10.1145/3691620.3696020 ...

work page doi:10.1145/3691620.3696020 2024
[66]

Congying Xu, Valerio Terragni, Hengcheng Zhu, Jiarong Wu, and Shing-Chi Cheung. 2024. MR-Scout: Automated Synthesis of Metamorphic Relations from Existing Test Cases.ACM Transactions on Software Engineering and Methodology33, 6 (2024), 150

2024
[67]

Zhen Yang, Fang Liu, Zhongxing Yu, Jacky Wai Keung, Jia Li, Shuo Liu, Yifan Hong, Xiaoxue Ma, Zhi Jin, and Ge Li. 2024. Exploring and Unleashing the Power of Large Language Models in Automated Code Translation.CoRR abs/2404.14646 (2024). arXiv:2404.14646

work page arXiv 2024
[68]

Yuanyuan Yuan, Shuai Wang, Mingyue Jiang, and Tsong Yueh Chen. 2021. Perception Matters: Detecting Perception Failures of VQA Models Using Metamorphic Testing. InConference on Computer Vision and Pattern Recognition. Computer Vision Foundation / IEEE, 16908–16917

2021
[69]

Zhiqiang Yuan, Mingwei Liu, Shiji Ding, Kaixin Wang, Yixuan Chen, Xin Peng, and Yiling Lou. 2024. Evaluating and Improving ChatGPT for Unit Test Generation.Proc. ACM Softw. Eng.1, FSE (2024), 1703–1726. https://doi.org/10.1145/ 3660783

2024
[70]

Bo Zhang, Hongyu Zhang, Junjie Chen, Dan Hao, and Pablo Moscato. 2019. Automatic Discovery and Cleansing of Numerical Metamorphic Relations. InIEEE International Conference on Software Maintenance and Evolution. IEEE, 235–245

2019
[71]

Jie Zhang, Junjie Chen, Dan Hao, Yingfei Xiong, Bing Xie, Lu Zhang, and Hong Mei. 2014. Search-based inference of polynomial metamorphic relations. InACM/IEEE International Conference on Automated Software Engineering. ACM, 701–712

2014
[72]

Jiaming Zhang, Chang-Ai Sun, Huai Liu, and Sijin Dong. 2025. Can Large Language Models Discover Metamorphic Rela- tions? A Large-Scale Empirical Study. InIEEE International Conference on Software Analysis, Evolution and Reengineering, SANER 2025, Montreal, QC, Canada, March 4-7, 2025. IEEE, 24–35. https://doi.org/10.1109/SANER64311.2025.00011

work page doi:10.1109/saner64311.2025.00011 2025
[73]

Yifan Zhang, Tsong Yueh Chen, Matthew Pike, Dave Towey, Zhihao Ying, and Zhi Quan Zhou. 2025. Enhancing autonomous driving simulations: A hybrid metamorphic testing framework with metamorphic relations generated by GPT.Inf. Softw. Technol.187 (2025), 107828. https://doi.org/10.1016/J.INFSOF.2025.107828

work page doi:10.1016/j.infsof.2025.107828 2025
[74]

Yifan Zhang, Dave Towey, and Matthew Pike. 2023. Automated Metamorphic-Relation Generation with ChatGPT: An Experience Report. In47th IEEE Annual Computers, Software, and Applications Conference, COMPSAC 2023, Torino, Italy, June 26-30, 2023. IEEE, 1780–1785. https://doi.org/10.1109/COMPSAC57700.2023.00275

work page doi:10.1109/compsac57700.2023.00275 2023
[75]

Yifan Zhang, Dave Towey, Matthew Pike, Quang-Hung Luu, Huai Liu, and Tsong Yueh Chen. 2025. Integrating Artificial Intelligence with Human Expertise: An In-depth Analysis of ChatGPT’s Capabilities in Generating Metamorphic Relations.CoRRabs/2503.22141 (2025). arXiv:2503.22141 https://arxiv.org/abs/2503.22141

work page arXiv 2025
[76]

Ziyao Zhang, Chong Wang, Yanlin Wang, Ensheng Shi, Yuchi Ma, Wanjun Zhong, Jiachi Chen, Mingzhi Mao, and Zibin Zheng. 2025. LLM Hallucinations in Practical Code Generation: Phenomena, Mechanism, and Mitigation.Proc. ACM Softw. Eng.2, ISSTA (2025), 481–503. https://doi.org/10.1145/3728894

work page doi:10.1145/3728894 2025
[77]

Zhi Quan Zhou, Liqun Sun, Tsong Yueh Chen, and Dave Towey. 2020. Metamorphic Relations for Enhancing System Understanding and Use.IEEE Transactions on Software Engineering46, 10 (2020), 1120–1154

2020
[78]

2025.Zingg Issue #60

Zingg. 2025.Zingg Issue #60. https://github.com/zinggAI/zingg/issues/60 Proc. ACM Softw. Eng., Vol. 3, No. FSE, Article FSE206. Publication date: July 2026

2025