pith. machine review for the scientific record. sign in

arxiv: 2604.10126 · v2 · submitted 2026-04-11 · 💻 cs.SE · cs.AI

Recognition: unknown

MR-Coupler: Automated Metamorphic Test Generation via Functional Coupling Analysis

Authors on Pith no claims yet

Pith reviewed 2026-05-10 15:50 UTC · model grok-4.3

classification 💻 cs.SE cs.AI
keywords metamorphic testingmetamorphic relationsfunctional couplingautomated test generationlarge language modelssoftware testingoracle problemmutation analysis
0
0 comments X

The pith

Functional coupling between methods in source code lets large language models automatically generate valid metamorphic test cases.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces MR-Coupler to overcome the main obstacle in metamorphic testing: the manual effort required to define metamorphic relations that link different inputs and outputs. It does so by scanning source code for pairs of methods that exhibit functional coupling, using three readily computed features to select the most promising pairs without exhaustive search. Large language models are then prompted to turn those pairs into candidate metamorphic test cases, which are filtered and strengthened through test amplification and mutation analysis to cut false positives. On a benchmark of 100 human-written metamorphic test cases the method produces valid outputs for more than 90 percent of the tasks, while on 50 real-world bugs the generated cases detect 44 percent of the faults and reduce false alarms relative to prior automated approaches. If the technique works as described, metamorphic testing could move from a niche expert activity to a routine part of ordinary test suites.

Core claim

MR-Coupler identifies functionally coupled method pairs via three coupling features, prompts large language models to instantiate metamorphic relations for those pairs, and validates the resulting metamorphic test cases with test amplification and mutation analysis, yielding valid cases for over 90 percent of evaluated tasks and detecting 44 percent of real-world bugs.

What carries the argument

MR-Coupler, the pipeline that selects method pairs by functional coupling features, delegates relation instantiation to large language models, and applies test amplification plus mutation analysis for validation.

Load-bearing premise

The three chosen features of functional coupling between methods reliably indicate pairs that possess useful metamorphic relations an LLM can formulate correctly.

What would settle it

Applying MR-Coupler to a fresh collection of 50 industrial programs and finding that the generated metamorphic test cases detect fewer than 25 percent of the injected or reported bugs would falsify the reported effectiveness.

Figures

Figures reproduced from arXiv: 2604.10126 by Congying Xu, Hengcheng Zhu, Jiarong Wu, Shing-Chi Cheung, Songqiang Chen, Valerio Terragni.

Figure 1
Figure 1. Figure 1: An overview of MR-Coupler coupling between methods as the basis for formulating MRs, and by automatically generating valid MTCs that can be applied to diverse inputs to enhance test adequacy. 3 Approach: MR-Coupler In this section, we present MR-Coupler, an automated MTC generator based on functional coupling [PITH_FULL_IMAGE:figures/full_fig_p005_1.png] view at source ↗
read the original abstract

Metamorphic testing (MT) is a widely recognized technique for alleviating the oracle problem in software testing. However, its adoption is hindered by the difficulty of constructing effective metamorphic relations (MRs), which often require domain-specific or hard-to-obtain knowledge. In this work, we propose a novel approach that leverages the functional coupling between methods, which is readily available in source code, to automatically construct MRs and generate metamorphic test cases (MTCs). Our technique, MR-Coupler, identifies functionally coupled method pairs, employs large language models to generate candidate MTCs, and validates them through test amplification and mutation analysis. In particular, we leverage three functional coupling features to avoid expensive enumeration of possible method pairs, and a novel validation mechanism to reduce false alarms. Our evaluation of MR-Coupler on 100 human-written MTCs and 50 real-world bugs shows that it generates valid MTCs for over 90% of tasks, improves valid MTC generation by 64.90%, and reduces false alarms by 36.56% compared to baselines. Furthermore, the MTCs generated by MR-Coupler detect 44% of the real bugs. Our results highlight the effectiveness of leveraging functional coupling for automated MR construction and the potential of MR-Coupler to facilitate the adoption of MT in practice. We also released the tool and experimental data to support future research.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

0 major / 2 minor

Summary. The paper introduces MR-Coupler, a technique for automated metamorphic test case (MTC) generation. It identifies functionally coupled method pairs using three code-derived features to avoid exhaustive enumeration, employs large language models to generate candidate MTCs based on metamorphic relations, and validates candidates via test amplification combined with mutation analysis to reduce false alarms. Evaluation is performed on 100 human-written MTCs and 50 real-world bugs, reporting >90% valid MTC generation, a 64.90% improvement in valid MTC generation over baselines, a 36.56% reduction in false alarms, and detection of 44% of the real bugs.

Significance. If the results hold, the work addresses a longstanding barrier to metamorphic testing adoption by automating MR construction from readily available source-code features rather than domain expertise. The pipeline integrates static analysis, LLM generation, and dynamic validation in a manner that appears internally consistent and externally validated via mutation analysis and real faults. The explicit release of the tool and experimental data is a clear strength that supports reproducibility and follow-on research.

minor comments (2)
  1. The abstract and evaluation summary report concrete percentages (e.g., 64.90% improvement, 36.56% false-alarm reduction) but do not name the exact baseline techniques or statistical tests used; adding this detail would strengthen the comparison claims.
  2. The three functional coupling features are central to narrowing the search space, yet the manuscript would benefit from a brief justification or reference to prior work on why these particular features (rather than alternatives) were selected.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for the positive summary of our work on MR-Coupler and the recommendation for minor revision. We are encouraged that the approach's potential to address longstanding challenges in metamorphic testing adoption through functional coupling analysis is recognized, along with the strengths in reproducibility via tool and data release.

Circularity Check

0 steps flagged

No significant circularity identified

full rationale

The paper describes an empirical tool (MR-Coupler) that identifies functionally coupled method pairs via three code-derived features, uses LLMs to propose MTCs, and validates them via test amplification plus mutation analysis. All reported performance numbers (90% valid MTCs, 64.90% improvement, 36.56% false-alarm reduction, 44% bug detection) are obtained by direct measurement against external artifacts: 100 human-written MTCs and 50 real-world bugs. No equations, predictions, or uniqueness claims reduce by construction to quantities defined inside the paper; the validation pipeline is explicitly external and falsifiable. No self-citation chains or ansatzes are load-bearing for the central result.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on one domain assumption about functional coupling and contains no fitted free parameters or newly invented entities.

axioms (1)
  • domain assumption Functional coupling between methods, detectable from source code, indicates the existence of useful metamorphic relations.
    This premise is invoked to justify why the three coupling features can be used to select candidate method pairs without exhaustive search.

pith-pipeline@v0.9.0 · 5564 in / 1382 out tokens · 36695 ms · 2026-05-10T15:50:27.079748+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

78 extracted references · 28 canonical work pages

  1. [1]

    2025.Qwen3-coder

    Alibaba. 2025.Qwen3-coder. Retrieved September 1, 2025 from https://qwenlm.github.io/blog/qwen3-coder/

  2. [2]

    Sahraoui

    Simon Allier, Stéphane Vaucher, Bruno Dufour, and Houari A. Sahraoui. 2010. Deriving Coupling Metrics from Call Graphs. InTenth IEEE International Working Conference on Source Code Analysis and Manipulation, SCAM 2010, Timisoara, Romania, 12-13 September 2010. IEEE Computer Society, 43–52. https://doi.org/10.1109/SCAM.2010.25

  3. [3]

    Juan Altmayer Pizzorno and Emery D. Berger. 2025. CoverUp: Effective High Coverage Test Generation for Python. Proc. ACM Softw. Eng.2, FSE, Article FSE128 (June 2025), 23 pages. https://doi.org/10.1145/3729398

  4. [4]

    Jon Ayerdi, Valerio Terragni, Aitor Arrieta, Paolo Tonella, Goiuria Sagardui, and Maite Arratibel. 2021. Generating metamorphic relations for cyber-physical systems with genetic programming: an industrial case study. InJoint European Software Engineering Conference and Symposium on the Foundations of Software Engineering. ACM, 1264–1274

  5. [5]

    Jon Ayerdi, Valerio Terragni, Gunel Jahangirova, Aitor Arrieta, and Paolo Tonella. 2024. GenMorph: Automatically Generating Metamorphic Relations via Genetic Programming.IEEE Transactions on Software Engineering(2024), 1–12

  6. [6]

    Ernst, Mauro Pezzè, and Antonio Carzaniga

    Arianna Blasi, Alessandra Gorla, Michael D. Ernst, Mauro Pezzè, and Antonio Carzaniga. 2021. MeMo: Automatically identifying metamorphic relations in Javadoc comments for test automation.J. Syst. Softw.181 (2021), 111041. https://doi.org/10.1016/J.JSS.2021.111041

  7. [7]

    Adam Bodicoat, Gunel Jahangirova, and Valerio Terragni. 2025. Understanding LLM-Driven Test Oracle Generation. In2025 2nd IEEE/ACM International Conference on AI-powered Software (AIware). IEEE, 29–39

  8. [8]

    Jialun Cao, Meiziniu Li, Yeting Li, Ming Wen, Shing-Chi Cheung, and Haiming Chen. 2022. SemMT: A Semantic-Based Testing Approach for Machine Translation Systems.ACM Transactions on Software Engineering and Methodology31, 2 (2022), 34e:1–34e:36

  9. [9]

    Jialun Cao, Wuqi Zhang, and Shing-Chi Cheung. 2024. Concerned with Data Contamination? Assessing Countermea- sures in Code Language Model.CoRRabs/2403.16898 (2024). arXiv:2403.16898

  10. [10]

    Songqiang Chen, Shuo Jin, and Xiaoyuan Xie. 2021. Testing Your Question Answering Software via Asking Recursively. InInternational Conference on Automated Software Engineering. IEEE, 104–116

  11. [11]

    Tsong Yueh Chen, Fei-Ching Kuo, Huai Liu, Pak-Lok Poon, Dave Towey, T. H. Tse, and Zhi Quan Zhou. 2018. Metamorphic Testing: A Review of Challenges and Opportunities.ACM Comput. Surv.51, 1 (2018), 4:1–4:27. https: //doi.org/10.1145/3143561

  12. [12]

    Tsong Yueh Chen, Pak-Lok Poon, and Xiaoyuan Xie. 2016. METRIC: METamorphic Relation Identification based on the Category-choice framework.J. Syst. Softw.116 (2016), 177–190. https://doi.org/10.1016/j.jss.2015.07.037

  13. [13]

    Yinghao Chen, Zehao Hu, Chen Zhi, Junxiao Han, Shuiguang Deng, and Jianwei Yin. 2024. ChatUniTest: A Framework for LLM-Based Test Generation. InCompanion Proceedings of the 32nd ACM International Conference on the Foundations of Software Engineering. 572–576

  14. [14]

    Steven Cho, Stefano Ruberto, and Valerio Terragni. 2025. LLMORPH: Automated Metamorphic Testing of Large Language Models. InProceedings of the 40th IEEE/ACM International Conference on Automated Software Engineering. 4102–4105. https://doi.org/10.1109/ASE63991.2025.00385

  15. [15]

    Steven Cho, Stefano Ruberto, and Valerio Terragni. 2025. Metamorphic Testing of Large Language Models for Natural Language Processing. InProceedings of the 41st IEEE International Conference on Software Maintenance and Evolution (ICSME). IEEE, 174–186. https://doi.org/10.1109/ICSME64153.2025.00025

  16. [16]

    2025.DeepSeek-V3.1

    DeepSeek. 2025.DeepSeek-V3.1. Retrieved September 1, 2025 from https://api-docs.deepseek.com/news/news250821 Proc. ACM Softw. Eng., Vol. 3, No. FSE, Article FSE206. Publication date: July 2026. FSE206:22 Congying Xu, Hengcheng Zhu, Songqiang Chen, Jiarong Wu, Valerio Terragni, and Shing-Chi Cheung

  17. [17]

    2025.SimplerPlannerTest

    Diennea. 2025.SimplerPlannerTest. https://github.com/diennea/herddb/blob/master/herddb-core/src/test/java/herddb/ sql/SimplerPlannerTest.java

  18. [18]

    Xueying Du, Mingwei Liu, Kaixin Wang, Hanlin Wang, Junwei Liu, Yixuan Chen, Jiayi Feng, Chaofeng Sha, Xin Peng, and Yiling Lou. 2024. Evaluating Large Language Models in Class-Level Code Generation. Innternational Conference on Software Engineering. ACM, 81:1–81:13

  19. [19]

    Aryaz Eghbali and Michael Pradel. 2024. De-Hallucinator: Iterative Grounding for LLM-Based Code Completion.CoRR abs/2401.01701 (2024). arXiv:2401.01701

  20. [20]

    2025.leaderboard

    Evalplus. 2025.leaderboard. Retrieved September 1, 2025 from https://evalplus.github.io/leaderboard.html

  21. [21]

    2025.BasicParserFilteringTest

    FasterXML. 2025.BasicParserFilteringTest. https://github.com/FasterXML/jackson-core/blob/3.x/src/test/java/tools/ jackson/core/unittest/filter/BasicParserFilteringTest.java#L432

  22. [22]

    Enrico Fregnan, Tobias Baum, Fabio Palomba, and Alberto Bacchelli. 2019. A survey on software coupling relations and tools.Inf. Softw. Technol.107 (2019), 159–178. https://doi.org/10.1016/J.INFSOF.2018.11.008

  23. [23]

    Christoph Hazott and Daniel Große. 2025. LLM-assisted Metamorphic Testing of Embedded Graphics Libraries. In Forum on Specification and Design Languages. https://ics.jku.at/files/2025FDL_LLM-assisted_Metamorphic_Testing_ of_Embedded_Graphics_Libraries.pdf

  24. [24]

    Dwyer, Sebastian Elbaum, and Willem Visser

    Soneya Binta Hossain, Antonio Filieri, Matthew B. Dwyer, Sebastian G. Elbaum, and Willem Visser. 2023. Neural- Based Test Oracle Generation: A Large-Scale Evaluation and Lessons Learned. InProceedings of the 31st ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering, ESEC/FSE 2023, San Francisco, CA, U...

  25. [25]

    2025.IoTDB Issue #13691

    Apache IoTDB. 2025.IoTDB Issue #13691. https://github.com/apache/iotdb/pull/13691

  26. [26]

    2024.JavaParser

    JavaParser. 2024.JavaParser. Retrieved June 6, 2024 from https://javaparser.org/

  27. [27]

    2025.GitHub Commit 777a078913

    Jcabi. 2025.GitHub Commit 777a078913. https://github.com/jcabi/jcabi-github/commit/777a078913

  28. [28]

    Yu Jiang, Jie Liang, Fuchen Ma, Yuanliang Chen, Chijin Zhou, Yuheng Shen, Zhiyong Wu, Jingzhou Fu, Mingzhe Wang, Shanshan Li, et al. 2024. When fuzzing meets llms: Challenges and opportunities. InCompanion Proceedings of the 32nd ACM International Conference on the Foundations of Software Engineering. 492–496

  29. [29]

    Knowledge Cutoff Information of GPT-4o-mini [n. d.]. https://community.openai.com/t/introducing-gpt-4o-mini-in- the-api/871594

  30. [30]

    Vu Le, Mehrdad Afshari, and Zhendong Su. 2014. Compiler validation via equivalence modulo inputs. InConference on Programming Language Design and Implementation. ACM, 216–226

  31. [31]

    Lahiri, and Siddhartha Sen

    Caroline Lemieux, Jeevana Priya Inala, Shuvendu K. Lahiri, and Siddhartha Sen. 2023. CodaMosa: Escaping Cover- age Plateaus in Test Generation with Pre-trained Large Language Models. InInternational Conference on Software Engineering. IEEE, 919–931

  32. [32]

    Chengshu Li, Jacky Liang, Andy Zeng, Xinyun Chen, Karol Hausman, Dorsa Sadigh, Sergey Levine, Li Fei-Fei, Fei Xia, and Brian Ichter. 2023. Chain of Code: Reasoning with a Language Model-Augmented Code Emulator.CoRR abs/2312.04474 (2023). arXiv:2312.04474

  33. [33]

    Jiapeng Li, Zheng Zheng, Yuning Xing, Daixu Ren, Steven Cho, and Valerio Terragni. 2025. MDPMORPH: An MDP- Based Metamorphic Testing Framework for Deep Reinforcement Learning Agents. InProceedings of the 36th IEEE International Symposium on Software Reliability Engineering. 154–166. https://doi.org/10.1109/ISSRE66568.2025.00028

  34. [34]

    Jiapeng Li, Zheng Zheng, Yuning Xing, Daixu Ren, Steven Cho, and Valerio Terragni. 2025. Metamorphic Testing of Deep Reinforcement Learning Agents with MDPMORPH. InProceedings of the 40th IEEE/ACM International Conference on Automated Software Engineering. 4086–4089. https://doi.org/10.1109/ASE63991.2025.00381

  35. [35]

    Mankowitz, Esme Sutherland Robson, Pushmeet Kohli, Nando de Freitas, Koray Kavukcuoglu, and Oriol Vinyals

    Yujia Li, David H. Choi, Junyoung Chung, Nate Kushman, Julian Schrittwieser, and et al. 2022. Competition-Level Code Generation with AlphaCode.CoRRabs/2203.07814 (2022). arXiv:2203.07814

  36. [36]

    Huai Liu, Fei-Ching Kuo, Dave Towey, and Tsong Yueh Chen. 2014. How Effectively Does Metamorphic Testing Alleviate the Oracle Problem?IEEE Transactions on Software Engineering40, 1 (2014), 4–22

  37. [37]

    2025.NTV2Test

    LocationTech. 2025.NTV2Test. https://github.com/locationtech/proj4j/blob/master/core/src/test/java/org/locationtech/ proj4j/datum/NTV2Test.java

  38. [38]

    Quang-Hung Luu, Huai Liu, and Tsong Yueh Chen. 2023. Can ChatGPT Advance Software Testing Intelligence? An Experience Report on Metamorphic Testing.CoRRabs/2310.19204 (2023). arXiv:2310.19204 https://arxiv.org/abs/2310. 19204

  39. [39]

    Haoyang Ma, Qingchao Shen, Yongqiang Tian, Junjie Chen, and Shing-Chi Cheung. 2023. Fuzzing Deep Learning Compilers with HirGen. InInternational Symposium on Software Testing and Analysis. ACM, 248–260

  40. [40]

    2025.Mutation Testing

    Major. 2025.Mutation Testing. https://mutation-testing.org/

  41. [41]

    Agustín Nolasco, Facundo Molina, Renzo Degiovanni, Alessandra Gorla, Diego Garbervetsky, Mike Papadakis, Sebastián Uchitel, Nazareno Aguirre, and Marcelo F. Frias. 2024. Abstraction-Aware Inference of Metamorphic Relations. Proceedings of the ACM on Software Engineering1, FSE (2024), 450–472. Proc. ACM Softw. Eng., Vol. 3, No. FSE, Article FSE206. Publica...

  42. [42]

    2025.GPT-4o mini

    OpenAI. 2025.GPT-4o mini. Retrieved September 1, 2025 from https://platform.openai.com/docs/models/gpt-4o-mini

  43. [43]

    2025.OjAlgo Issue #49

    Optimatika. 2025.OjAlgo Issue #49. https://github.com/optimatika/ojAlgo/issues/49

  44. [44]

    2025.OjAlgo Issue #49

    Optimatika. 2025.OjAlgo Issue #49. Retrieved September 1, 2025 from https://github.com/optimatika/ojAlgo/issues/49

  45. [45]

    Denys Poshyvanyk, Andrian Marcus, Rudolf Ferenc, and Tibor Gyimóthy. 2009. Using information retrieval based coupling measures for impact analysis.Empir. Softw. Eng.14, 1 (2009), 5–32. https://doi.org/10.1007/S10664-008-9088-2

  46. [46]

    Ravin Ravi, Dylan Bradshaw, Stefano Ruberto, Gunel Jahangirova, and Valerio Terragni. 2025. LLMLOOP: Improving LLM-Generated Code and Tests through Automated Iterative Feedback Loops. In2025 IEEE International Conference on Software Maintenance and Evolution (ICSME). IEEE, 930–934

  47. [47]

    Max Schäfer, Sarah Nadi, Aryaz Eghbali, and Frank Tip. 2023. An empirical evaluation of using large language models for automated unit test generation.IEEE Transactions on Software Engineering50, 1 (2023), 85–105

  48. [48]

    Sergio Segura, Gordon Fraser, Ana Belén Sánchez, and Antonio Ruiz Cortés. 2016. A Survey on Metamorphic Testing. IEEE Trans. Software Eng.42, 9 (2016), 805–824. https://doi.org/10.1109/TSE.2016.2532875

  49. [49]

    Sergio Segura, José Antonio Parejo, Javier Troya, and Antonio Ruiz Cortés. 2018. Metamorphic Testing of RESTful Web APIs.IEEE Transactions on Software Engineering44, 11 (2018), 1083–1099

  50. [50]

    Seung Yeob Shin, Fabrizio Pastore, Domenico Bianculli, and Alexandra Baicoianu. 2024. Towards Generating Executable Metamorphic Relations Using Large Language Models. InQuality of Information and Communications Technology - 17th International Conference on the Quality of Information and Communications Technology, QUATIC 2024, Pisa, Italy, September 11-13,...

  51. [51]

    Chengnian Sun, Vu Le, and Zhendong Su. 2016. Finding compiler bugs via live code mutation. InInternational Conference on Object-Oriented Programming, Systems, Languages, and Applications,. ACM, 849–863

  52. [52]

    Chang-Ai Sun, Yiqiang Liu, Zuoyi Wang, and W. K. Chan. 2016. 𝜇MT: a data mutation directed metamorphic relation acquisition methodology. InInternational Workshop on Metamorphic Testing. ACM, 12–18

  53. [53]

    Yutian Tang, Zhijie Liu, Zhichao Zhou, and Xiapu Luo. 2024. ChatGPT vs SBST: A Comparative Assessment of Unit Test Suite Generation.IEEE Transactions on Software Engineering(2024), 1–19

  54. [54]

    Valerio Terragni, Gunel Jahangirova, Paolo Tonella, and Mauro Pezzè. 2020. Evolutionary Improvement of Assertion Oracles. InJoint European Software Engineering Conference and Symposium on the Foundations of Software Engineering. 1178–1189

  55. [55]

    Valerio Terragni, Annie Vella, Partha Roop, and Kelly Blincoe. 2025. The Future of AI-Driven Software Engineering. ACM Trans. Softw. Eng. Methodol.34, 5 (Jan. 2025). https://doi.org/10.1145/3715003

  56. [56]

    2025.AESEncryptionTest

    TheAlgorithms. 2025.AESEncryptionTest. https://github.com/TheAlgorithms/Java/blob/master/src/test/java/com/ thealgorithms/ciphers/AESEncryptionTest.java [57]MR-Coupler. 2025.MR-Couplerwebsite. Retrieved September 2, 2025 from https://mr-coupler.github.io/ [58]MR-Coupler. 2026.MR-Coupleron Zenodo. Retrieved April 2, 2026 from https://doi.org/10.5281/zenodo...

  57. [57]

    Christos Tsigkanos, Pooja Rani, Sebastian Müller, and Timo Kehrer. 2023. Variable Discovery with Large Language Models for Metamorphic Testing of Scientific Software. InComputational Science - ICCS 2023 - 23rd International Conference, Prague, Czech Republic, July 3-5, 2023, Proceedings, Part I (Lecture Notes in Computer Science, Vol. 14073). Springer, 321–335

  58. [58]

    2025.dubbo

    vmgama. 2025.dubbo. https://github.com/vmgama/dubbo/blob/master/dubbo-common/src/main/java/org/apache/ dubbo/common/io/Bytes.java

  59. [59]

    Ying Wang, Bihuan Chen, Kaifeng Huang, Bowen Shi, Congying Xu, Xin Peng, Yijian Wu, and Yang Liu. 2020. An Empirical Study of Usages, Updates and Risks of Third-Party Libraries in Java Projects. InInternational Conference on Software Maintenance and Evolution. IEEE, 35–45

  60. [60]

    2025.SparseBitSet Issue #13

    Brett Wooldridge. 2025.SparseBitSet Issue #13. https://github.com/brettwooldridge/SparseBitSet/issues/13

  61. [61]

    Chunqiu Steven Xia, Matteo Paltenghi, Jia Le Tian, Michael Pradel, and Lingming Zhang. 2024. Fuzz4All: Universal Fuzzing with Large Language Models. InInternational Conference on Software Engineering. ACM, 126:1–126:13

  62. [62]

    Xiaoyuan Xie, Shuo Jin, and Songqiang Chen. 2023. qaAskeR+: a novel testing method for question answering software via asking recursive questions.Automated Software Engineering30, 1 (2023), 14

  63. [63]

    Xiaoyuan Xie, Shuo Jin, Songqiang Chen, and Shing-Chi Cheung. 2024. Word Closure-Based Metamorphic Testing for Machine Translation.ACM Transactions on Software Engineering and Methodology(jul 2024)

  64. [64]

    Congying Xu, Songqiang Chen, Jiarong Wu, Shing-Chi Cheung, Valerio Terragni, Hengcheng Zhu, and Jialun Cao

  65. [65]

    MR-Adopt: Automatic Deduction of Input Transformation Function for Metamorphic Testing. InProceedings of the 39th IEEE/ACM International Conference on Automated Software Engineering, ASE 2024, Sacramento, CA, USA, October 27 - November 1, 2024, Vladimir Filkov, Baishakhi Ray, and Minghui Zhou (Eds.). ACM, 557–569. https: //doi.org/10.1145/3691620.3696020 ...

  66. [66]

    Congying Xu, Valerio Terragni, Hengcheng Zhu, Jiarong Wu, and Shing-Chi Cheung. 2024. MR-Scout: Automated Synthesis of Metamorphic Relations from Existing Test Cases.ACM Transactions on Software Engineering and Methodology33, 6 (2024), 150

  67. [67]

    Zhen Yang, Fang Liu, Zhongxing Yu, Jacky Wai Keung, Jia Li, Shuo Liu, Yifan Hong, Xiaoxue Ma, Zhi Jin, and Ge Li. 2024. Exploring and Unleashing the Power of Large Language Models in Automated Code Translation.CoRR abs/2404.14646 (2024). arXiv:2404.14646

  68. [68]

    Yuanyuan Yuan, Shuai Wang, Mingyue Jiang, and Tsong Yueh Chen. 2021. Perception Matters: Detecting Perception Failures of VQA Models Using Metamorphic Testing. InConference on Computer Vision and Pattern Recognition. Computer Vision Foundation / IEEE, 16908–16917

  69. [69]

    Zhiqiang Yuan, Mingwei Liu, Shiji Ding, Kaixin Wang, Yixuan Chen, Xin Peng, and Yiling Lou. 2024. Evaluating and Improving ChatGPT for Unit Test Generation.Proc. ACM Softw. Eng.1, FSE (2024), 1703–1726. https://doi.org/10.1145/ 3660783

  70. [70]

    Bo Zhang, Hongyu Zhang, Junjie Chen, Dan Hao, and Pablo Moscato. 2019. Automatic Discovery and Cleansing of Numerical Metamorphic Relations. InIEEE International Conference on Software Maintenance and Evolution. IEEE, 235–245

  71. [71]

    Jie Zhang, Junjie Chen, Dan Hao, Yingfei Xiong, Bing Xie, Lu Zhang, and Hong Mei. 2014. Search-based inference of polynomial metamorphic relations. InACM/IEEE International Conference on Automated Software Engineering. ACM, 701–712

  72. [72]

    Jiaming Zhang, Chang-Ai Sun, Huai Liu, and Sijin Dong. 2025. Can Large Language Models Discover Metamorphic Rela- tions? A Large-Scale Empirical Study. InIEEE International Conference on Software Analysis, Evolution and Reengineering, SANER 2025, Montreal, QC, Canada, March 4-7, 2025. IEEE, 24–35. https://doi.org/10.1109/SANER64311.2025.00011

  73. [73]

    Yifan Zhang, Tsong Yueh Chen, Matthew Pike, Dave Towey, Zhihao Ying, and Zhi Quan Zhou. 2025. Enhancing autonomous driving simulations: A hybrid metamorphic testing framework with metamorphic relations generated by GPT.Inf. Softw. Technol.187 (2025), 107828. https://doi.org/10.1016/J.INFSOF.2025.107828

  74. [74]

    Yifan Zhang, Dave Towey, and Matthew Pike. 2023. Automated Metamorphic-Relation Generation with ChatGPT: An Experience Report. In47th IEEE Annual Computers, Software, and Applications Conference, COMPSAC 2023, Torino, Italy, June 26-30, 2023. IEEE, 1780–1785. https://doi.org/10.1109/COMPSAC57700.2023.00275

  75. [75]

    Yifan Zhang, Dave Towey, Matthew Pike, Quang-Hung Luu, Huai Liu, and Tsong Yueh Chen. 2025. Integrating Artificial Intelligence with Human Expertise: An In-depth Analysis of ChatGPT’s Capabilities in Generating Metamorphic Relations.CoRRabs/2503.22141 (2025). arXiv:2503.22141 https://arxiv.org/abs/2503.22141

  76. [76]

    Ziyao Zhang, Chong Wang, Yanlin Wang, Ensheng Shi, Yuchi Ma, Wanjun Zhong, Jiachi Chen, Mingzhi Mao, and Zibin Zheng. 2025. LLM Hallucinations in Practical Code Generation: Phenomena, Mechanism, and Mitigation.Proc. ACM Softw. Eng.2, ISSTA (2025), 481–503. https://doi.org/10.1145/3728894

  77. [77]

    Zhi Quan Zhou, Liqun Sun, Tsong Yueh Chen, and Dave Towey. 2020. Metamorphic Relations for Enhancing System Understanding and Use.IEEE Transactions on Software Engineering46, 10 (2020), 1120–1154

  78. [78]

    2025.Zingg Issue #60

    Zingg. 2025.Zingg Issue #60. https://github.com/zinggAI/zingg/issues/60 Proc. ACM Softw. Eng., Vol. 3, No. FSE, Article FSE206. Publication date: July 2026