pith. machine review for the scientific record. sign in

arxiv: 2604.10345 · v1 · submitted 2026-04-11 · 💻 cs.SE

Recognition: unknown

Fine-grained Multi-Document Extraction and Generation of Code Change Rationale

Authors on Pith no claims yet

Pith reviewed 2026-05-10 15:25 UTC · model grok-4.3

classification 💻 cs.SE
keywords code change rationalemulti-document extractionLLM for software engineeringcommit analysissoftware maintenanceempirical studyuser study
0
0 comments X

The pith

Code change rationales are fragmented across commit messages, issues, and pull requests, so an LLM tool can identify relevant sentences and generate useful summaries from multiple documents.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper shows through an empirical study of 63 commits that goals appear mostly in commit messages and pull requests while needs and alternatives appear more in issues, with no single artifact type holding everything. It then presents ARGUS, which uses large language models to scan all artifacts linked to a commit, pull out sentences about goals, needs, and alternatives, and produce short summaries. Evaluation found the best version of ARGUS reached 93 percent recall in finding rationale sentences and produced summaries judged accurate, while a study of twelve Java developers indicated the summaries helped with code review, documentation, and debugging. A sympathetic reader would care because developers routinely need to reconstruct why changes were made yet face scattered and incomplete records that slow down maintenance and review work.

Core claim

An empirical study of 63 commits from five open-source Java projects revealed that rationale components are highly fragmented: commit messages and pull requests mainly capture goals, needs and alternatives are more often in issues and PRs, and other components appear rarely and outside commit messages. No single artifact type contains all components. ARGUS, an LLM-based method, then identifies sentences expressing goal, need, and alternative across a commit's artifacts and synthesizes concise rationale summaries. On the studied commits the strongest variant achieved 51.4 percent precision and 93.2 percent recall for identification while the generated summaries were rated accurate; a user stu

What carries the argument

ARGUS, the LLM pipeline that locates sentences stating a change's goal, need, or alternative across commit messages, issues, and pull requests, then produces concise synthesized summaries.

If this is right

  • Tools for code review and maintenance must combine information from commit messages, issues, and pull requests rather than relying on any one source.
  • Developers can receive automatically generated rationale summaries that reduce the time spent searching scattered records for why a change was made.
  • Rationale summaries produced this way are perceived as helpful specifically for code review, writing documentation, and diagnosing bugs.
  • LLM-based extraction can achieve high recall in locating rationale sentences even when precision is moderate.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Integrating ARGUS-style summaries directly into version-control interfaces could make rationale visible at the moment a developer views a commit.
  • The same multi-document approach might be tested on larger sets of commits to check whether the observed fragmentation pattern holds beyond the five studied projects.
  • Measuring whether developers complete review or debugging tasks faster or with fewer errors when given these summaries would give a stronger test of practical value than perception ratings alone.

Load-bearing premise

The 63 commits from five Java projects sufficiently represent how rationale is distributed in typical development, and the LLM sentence identification works without major loss of accuracy on other projects or languages.

What would settle it

Running the same multi-document analysis on commits from a different programming language or from closed-source repositories and finding markedly different distributions of goals, needs, and alternatives across artifact types.

Figures

Figures reproduced from arXiv: 2604.10345 by Antonio Mastropaolo, Antu Saha, Mehedi Sun, Nadeeshan De Silva, Oscar Chaparro.

Figure 1
Figure 1. Figure 1: A Motivating Example from the OkHttp Project[2] Aritfacts: Commit Message Class Javadocs Pull Request (PR) Issue Commit 4c86085 Issue #666: Native crash in SSL ... Commit Message: Drop ALPN support. ... Comment#10: Yes, but the startHandshake code calls SSL_CTX_set_alpn_ proto which does not look thread-safe at all. Comment#1: That said, we can probably do something to work-around in OkHttp ... Comment#12:… view at source ↗
Figure 2
Figure 2. Figure 2: Argus’s Architecture Manuscript submitted to ACM [PITH_FULL_IMAGE:figures/full_fig_p012_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Structure of the developed prompts for both of [PITH_FULL_IMAGE:figures/full_fig_p015_3.png] view at source ↗
read the original abstract

Understanding the reasons behind past code changes is critical for many software engineering tasks, including refactoring and reviewing code, diagnosing bugs, and implementing new features. Unfortunately, locating and reconstructing this rationale can be difficult for developers because the information is often fragmented, inconsistently documented, and scattered across different artifacts such as commit messages, issue reports, and PRs. In this paper, we address this challenge in two steps. First, we conduct an empirical study of 63 commits from five open-source Java projects to analyze how rationale components (e.g., a change's goal, need, and alternative) are distributed across artifacts. We find that the rationale is highly fragmented: commit messages and pull requests primarily capture goals, while needs and alternatives are more often found in issues and PRs. Other components are scarce but found in artifacts other than commit messages. No single artifact type captures all components, underscoring the need for cross-document reasoning and synthesis. Second, we introduce ARGUS, an LLM-based approach that identifies sentences expressing goal, need, and alternative across a commit's artifacts and creates concise rationale summaries to support code comprehension and maintenance tasks. We evaluated ARGUS on the 63 commits and compared its performance against baseline variants. The best-performing version achieved 51.4% precision and 93.2% recall for rationale identification, while producing rationale summaries rated as accurate. A user study with 12 Java developers further showed that these summaries were perceived as useful and helpful for tasks such as code review, documentation, and debugging. Our results highlight the need for multi-document reasoning in capturing rationale and demonstrate the potential of ARGUS to help developers understand and maintain software systems.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 2 minor

Summary. The paper conducts an empirical study of 63 commits from five open-source Java projects to analyze the distribution of code change rationale components (e.g., goals, needs, alternatives) across artifacts such as commit messages, issues, and PRs, finding high fragmentation with no single artifact capturing all components. It then introduces ARGUS, an LLM-based approach to identify relevant sentences across multi-document artifacts and generate concise rationale summaries. ARGUS is evaluated on the same 63 commits, with the best variant achieving 51.4% precision and 93.2% recall for identification, accurate summaries, and a user study with 12 Java developers rating the summaries as useful for code review, documentation, and debugging.

Significance. If the results hold under broader validation, the work contributes concrete empirical data on rationale fragmentation and demonstrates a practical LLM-based multi-document synthesis method that could support key SE tasks. The empirical counts across artifact types and the developer user study are strengths that provide grounded evidence rather than purely synthetic claims.

major comments (1)
  1. [Evaluation] The evaluation of ARGUS reports 51.4% precision and 93.2% recall based solely on the same 63 commits from the empirical study, with no held-out test set, cross-project validation, or external commits described. This setup is load-bearing for the central performance and usefulness claims, as the metrics and developer perceptions could be artifacts of the narrow sample (five Java projects) rather than evidence of broader applicability.
minor comments (2)
  1. [Abstract] The abstract and evaluation description provide limited detail on inter-rater agreement for the manual rationale labeling, the specific LLM prompts or engineering choices, and how the baseline variants were implemented; adding these would improve reproducibility and assessment of the identification results.
  2. The paper could strengthen the threats-to-validity discussion by explicitly addressing the representativeness of the 63 commits and potential domain shift beyond the studied Java projects.

Simulated Author's Rebuttal

1 responses · 0 unresolved

Thank you for the opportunity to respond to the referee's report on our manuscript. We value the constructive criticism regarding the evaluation of ARGUS. We address the major comment below and indicate the changes we will make in the revised version.

read point-by-point responses
  1. Referee: [Evaluation] The evaluation of ARGUS reports 51.4% precision and 93.2% recall based solely on the same 63 commits from the empirical study, with no held-out test set, cross-project validation, or external commits described. This setup is load-bearing for the central performance and usefulness claims, as the metrics and developer perceptions could be artifacts of the narrow sample (five Java projects) rather than evidence of broader applicability.

    Authors: We acknowledge that the performance metrics for ARGUS were computed on the same 63 commits used in the empirical study, and that no held-out test set or cross-project validation was performed. This design choice stems from the fact that the empirical study was necessary to identify and annotate the rationale components across artifacts before developing and testing the extraction and generation approach. However, we agree that this limits the strength of the claims regarding broader applicability. In the revised manuscript, we will add a limitations section explicitly discussing the small sample size and lack of external validation. Additionally, we will perform and report a 5-fold cross-validation on the 63 commits to provide a more robust estimate of performance, and we will clarify that the user study with 12 developers offers qualitative insights rather than quantitative generalizability. We believe these revisions will address the core concern while preserving the contributions of the empirical analysis. revision: partial

Circularity Check

0 steps flagged

No significant circularity; empirical metrics and user study are independent of method definition

full rationale

The paper conducts a separate empirical study of rationale distribution across 63 commits, then applies an LLM-based identification and summarization method (ARGUS) whose outputs are compared against ground-truth annotations from that study to compute precision/recall. This is standard evaluation practice and does not reduce the reported 51.4% precision / 93.2% recall or user-study usefulness to a definitional tautology, fitted parameter, or self-citation chain. No equations, ansatzes, or uniqueness theorems appear; the LLM component operates independently of the specific performance numbers, and results remain falsifiable through replication on new commits or projects. The evaluation uses the study data for both analysis and testing, but this does not constitute circularity per the enumerated patterns.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The work rests on standard assumptions about what constitutes a rationale component and on the representativeness of the sampled commits; no free parameters or invented physical entities are introduced.

axioms (2)
  • domain assumption Rationale for a code change can be decomposed into goal, need, and alternative components that are expressed in natural language across commit messages, issues, and PRs.
    Invoked in the empirical study design and in the definition of what ARGUS extracts.
  • domain assumption LLM-based sentence classification can reliably surface these components when given the full set of artifacts for a commit.
    Central to the ARGUS approach and its evaluation metrics.

pith-pipeline@v0.9.0 · 5614 in / 1454 out tokens · 23422 ms · 2026-05-10T15:25:49.356345+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

73 extracted references · 14 canonical work pages

  1. [1]

    [n. d.]. JUnit4 commit cec4a6baf600b8dee3d1318c242a67b56874288a. https://github.com/junit-team/junit4/commit/ cec4a6baf600b8dee3d1318c242a67b56874288a. Accessed: 2026-03-27

  2. [2]

    [n. d.]. OkHttp commit 4c86085429edbeef0a383941936ee7b64cc3805e. https://github.com/square/okhttp/commit/ 4c86085429edbeef0a383941936ee7b64cc3805e. Accessed: 2026-03-27

  3. [3]

    Apache-Dubbo

    2025. Apache-Dubbo. https://dubbo.apache.org/

  4. [4]

    Comment Parser

    2025. Comment Parser. https://pypi.org/project/comment-parser/

  5. [5]

    2025. JUnit4. https://junit.org/junit4/

  6. [6]

    2025. OkHttp. https://square.github.io/okhttp/

  7. [7]

    Online Replication Package

    2025. Online Replication Package. https://anonymous.4open.science/r/Fine-grained-Multi-Document-Extraction-and-Generation-of-Code-Change- Rationale-8BC6/README.md

  8. [8]

    Retrofit

    2025. Retrofit. https://square.github.io/retrofit/

  9. [9]

    Spacy Model: en_core_web_trf

    2025. Spacy Model: en_core_web_trf. https://spacy.io/models/en. Manuscript submitted to ACM Fine-grained Multi-Document Extraction and Generation of Code Change Rationale 27

  10. [10]

    Spring-Boot

    2025. Spring-Boot. https://spring.io/projects/spring-boot

  11. [11]

    Khadijah Al Safwan, Mohammed Elarnaoty, and Francisco Servant. 2022. Developers’ need for the rationale of code commits: An in-breadth and in-depth study.Journal of Systems and Software189 (2022), 111320

  12. [12]

    Rana Alkadhi, Manuel Nonnenmacher, Emitza Guzman, and Bernd Bruegge. 2018. How do developers discuss rationale?. InProceedings of the IEEE 25th International Conference on Software Analysis, Evolution and Reengineering (SANER’18). 357–369

  13. [13]

    A.I. Anton. 1996. Goal-based requirements analysis. InProceedings of the Second International Conference on Requirements Engineering. 136–144. https://doi.org/10.1109/ICRE.1996.491438

  14. [14]

    Alberto Bacchelli and Christian Bird. 2013. Expectations, outcomes, and challenges of modern code review. In2013 35th International Conference on Software Engineering (ICSE). IEEE, 712–721

  15. [15]

    Tingting Bi, Wei Ding, Peng Liang, and Antony Tang. 2021. Architecture information communication in two OSS projects: The why, who, when, and what.Journal of Systems and Software181 (2021), 111035

  16. [16]

    Janet E Burge and David C Brown. 2008. Software engineering using rationale.Journal of Systems and Software81, 3 (2008), 395–413

  17. [17]

    Burge, John M

    Janet E. Burge, John M. Carroll, Raymond McCall, and Ivan Mistrik. 2008.Rationale-Based Software Engineering. Springer Berlin Heidelberg

  18. [18]

    Francesco Casillo, Antonio Mastropaolo, Gabriele Bavota, Vincenzo Deufemia, and Carmine Gravino. 2025. Towards Generating the Rationale for Code Changes. In2025 IEEE/ACM 33rd International Conference on Program Comprehension (ICPC). IEEE Computer Society, 327–338

  19. [19]

    Oscar Chaparro, Jing Lu, Fiorella Zampetti, Laura Moreno, Massimiliano Di Penta, Andrian Marcus, Gabriele Bavota, and Vincent Ng. 2017. Detecting missing information in bug descriptions. InProceedings of the 2017 11th joint meeting on foundations of software engineering. 396–407

  20. [20]

    Xiangping Chen, Yangzi Li, Zhicao Tang, Yuan Huang, Haojie Zhou, Mingdong Tang, and Zibin Zheng. 2024. ESGen: Commit Message Generation Based on Edit Sequence of Code Change. InProceedings of the 32nd IEEE/ACM International Conference on Program Comprehension. 112–124

  21. [21]

    Mihai Codoban, Sruti Srinivasa Ragavan, Danny Dig, and Brian Bailey. 2015. Software history under the lens: A study on why and how developers examine it. InProceedings of the IEEE International Conference on Software Maintenance and Evolution (ICSME’15). 1–10

  22. [22]

    Jacob Cohen. 1960. A coefficient of agreement for nominal scales.Educational and psychological measurement20, 1 (1960), 37–46

  23. [23]

    Jacob Cohen. 1968. Weighted kappa: Nominal scale agreement provision for scaled disagreement or partial credit.Psychological bulletin70, 4 (1968), 213

  24. [24]

    Luis Fernando Cortés-Coy, Mario Linares-Vásquez, Jairo Aponte, and Denys Poshyvanyk. 2014. On automatically generating commit messages via summarization of source code changes. In2014 IEEE 14th International Working Conference on Source Code Analysis and Manipulation. IEEE, 275–284

  25. [25]

    Mouna Dhaouadi, Bentley Oakes, and Michalis Famelis. 2025. Automated Extraction and Analysis of Developer’s Rationale in Open Source Software. Proceedings of the ACM on Software Engineering2, FSE (2025), 2548–2570

  26. [26]

    Mouna Dhaouadi, Bentley James Oakes, and Michalis Famelis. 2022. End-to-end rationale reconstruction. InProceedings of the 37th IEEE/ACM International Conference on Automated Software Engineering. 1–5

  27. [27]

    Mouna Dhaouadi, Bentley James Oakes, and Michalis Famelis. 2024. Rationale dataset and analysis for the commit messages of the Linux kernel out-of-memory killer. InProceedings of the 32nd IEEE/ACM International Conference on Program Comprehension. 415–425

  28. [28]

    Mouna Dhaouadi, Bentley James Oakes, and Michalis Famelis. 2025. CoMRAT: Commit Message Rationale Analysis Tool.2025 IEEE/ACM 22nd International Conference on Mining Software Repositories (MSR)(2025), 831–835. https://api.semanticscholar.org/CorpusID:278777370

  29. [29]

    Andrea Di Sorbo, Sebastiano Panichella, Corrado A Visaggio, Massimiliano Di Penta, Gerardo Canfora, and Harald C Gall. 2019. Exploiting natural language structures in software informal documentation.IEEE Transactions on Software Engineering47, 8 (2019), 1587–1604

  30. [31]

    Allen H Dutoit and Barbara Paech. 2001. Rationale management in software engineering. InHandbook of Software Engineering and Knowledge Engineering: Volume I: Fundamentals. World Scientific, 787–815

  31. [32]

    Felipe Ebert, Fernando Castor, Nicole Novielli, and Alexander Serebrenik. 2019. Confusion in Code Reviews: Reasons, Impacts, and Coping Strategies. InProceedings of the IEEE 26th International Conference on Software Analysis, Evolution and Reengineering. 49–60

  32. [33]

    Thomas Fritz and Gail C. Murphy. 2010. Using information fragments to answer the questions developers ask. InProceedings of the ACM/IEEE 32nd International Conference on Software Engineering, Vol. 1. 175–184

  33. [34]

    Fabian Gilson and Vincent Englebert. 2011. Rationale, decisions and alternatives traceability for architecture design. InProceedings of the 5th European Conference on Software Architecture: Companion Volume. 1–9

  34. [35]

    James Hoover

    Daqing Hou, Chandan Raj Rupakheti, and H. James Hoover. 2008. Documenting and Evaluating Scattered Concerns for Framework Usability: A Case Study. In2008 15th Asia-Pacific Software Engineering Conference. 213–220. https://doi.org/10.1109/APSEC.2008.39

  35. [36]

    Kaiya, H

    H. Kaiya, H. Horai, and M. Saeki. 2002. AGORA: attributed goal-oriented requirements analysis method. InProceedings IEEE Joint International Conference on Requirements Engineering. 13–22. https://doi.org/10.1109/ICRE.2002.1048501

  36. [37]

    Tushar Khot, Harsh Trivedi, Matthew Finlayson, Yao Fu, Kyle Richardson, Peter Clark, and Ashish Sabharwal. 2022. Decomposed prompting: A modular approach for solving complex tasks.arXiv preprint arXiv:2210.02406(2022)

  37. [38]

    Anja Kleebaum, Barbara Paech, Jan Ole Johanssen, and Bernd Bruegge. 2021. Continuous Rationale Visualization. In2021 Working Conference on Software Visualization (VISSOFT). 33–43. https://doi.org/10.1109/VISSOFT52517.2021.00013

  38. [39]

    Andrew J Ko, Robert DeLine, and Gina Venolia. 2007. Information needs in collocated software development teams. In29th International Conference on Software Engineering (ICSE’07). 344–353. Manuscript submitted to ACM 28 Mehedi Sun, Antu Saha, Nadeeshan De Silva, Antonio Mastropaolo, and Oscar Chaparro

  39. [40]

    2018.Content analysis: An introduction to its methodology

    Klaus Krippendorff. 2018.Content analysis: An introduction to its methodology. Sage publications

  40. [41]

    Yuri Kuratov, Aydar Bulatov, Petr Anokhin, Ivan Rodkin, Dmitry Sorokin, Artyom Sorokin, and Mikhail Burtsev. 2024. Babilong: Testing the limits of llms with long context reasoning-in-a-haystack.Advances in Neural Information Processing Systems37 (2024), 106519–106554

  41. [42]

    LaToza and Brad A

    Thomas D. LaToza and Brad A. Myers. 2010. Hard-to-answer questions about code. InEvaluation and Usability of Programming Languages and Tools. Association for Computing Machinery, 1–6. http://doi.org/10.1145/1937117.1937125

  42. [43]

    Jintae Lee. 1991. Extending the Potts and Bruns model for recording design rationale. InProceedings-13th International Conference on Software Engineering. IEEE Computer Society, 114–115

  43. [44]

    Mosh Levy, Alon Jacoby, and Yoav Goldberg. 2024. Same task, more tokens: the impact of input length on the reasoning performance of large language models. InProceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 15339–15353

  44. [45]

    Jiawei Li and Iftekhar Ahmed. 2023. Commit message matters: Investigating impact and evolution of commit message quality. In2023 IEEE/ACM 45th International Conference on Software Engineering (ICSE). IEEE, 806–817

  45. [46]

    Jiawei Li, David Faragó, Christian Petrov, and Iftekhar Ahmed. 2024. Only diff is not enough: Generating commit messages leveraging reasoning and action of large language model.Proceedings of the ACM on Software Engineering1, FSE (2024), 745–766

  46. [47]

    Jenny T Liang, Maryam Arab, Minhyuk Ko, Amy J Ko, and Thomas D LaToza. 2023. A qualitative study on the implementation design decisions of developers. In2023 IEEE/ACM 45th International Conference on Software Engineering (ICSE). 435–447

  47. [48]

    Yan Liang, Ying Liu, Chun Kit Kwong, and Wing Bun Lee. 2012. Learning the "Whys": Discovering design rationale using text mining - An algorithm perspective.Comput. Aided Des.44, 10 (Oct. 2012), 916–930. https://doi.org/10.1016/j.cad.2011.08.002

  48. [49]

    Bo Lin, Shangwen Wang, Zhongxin Liu, Yepang Liu, Xin Xia, and Xiaoguang Mao. 2023. Cct5: A code-change-oriented pre-trained model. In Proceedings of the 31st ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering. 1509–1521

  49. [50]

    Zhan Ling, Kang Liu, Kai Yan, Yifan Yang, Weijian Lin, Ting-Han Fan, Lingfeng Shen, Zhengyin Du, and Jiecao Chen. 2025. Longreason: A synthetic long-context reasoning benchmark via context expansion.arXiv preprint arXiv:2501.15089(2025)

  50. [51]

    Abhinav Reddy Mandli, Saurabhsingh Rajput, and Tushar Sharma. 2025. COMET: Generating commit messages using delta graph context representation.Journal of Systems and Software222 (2025), 112307

  51. [52]

    OpenAI. 2025. OpenAI o4-mini: Reasoning Language Model. https://en.wikipedia.org/wiki/OpenAI_o4-mini Accessed: 2025-09-11

  52. [53]

    OpenAI. 2025. Text Embedding 3 Large. https://platform.openai.com/docs/models/text-embedding-3-large Accessed: 2025-09-11

  53. [54]

    Luca Pascarella, Magiel Bruntink, and Alberto Bacchelli. 2019. Classifying code comments in Java software systems.Empirical Software Engineering 24, 3 (2019), 1499–1537

  54. [55]

    Luca Pascarella, Davide Spadini, Fabio Palomba, Magiel Bruntink, and Alberto Bacchelli. 2018. Information needs in contemporary code review. Proceedings of the ACM on human-computer interaction2, CSCW (2018), 1–27

  55. [56]

    Pooja Rani, Sebastiano Panichella, Manuel Leuenberger, Andrea Di Sorbo, and Oscar Nierstrasz. 2021. How to identify class comment types? A multi-language approach for class comment classification.Journal of systems and software181 (2021), 111047

  56. [57]

    Sarah Rastkar and Gail C. Murphy. 2013. Why did this code change?. InProceedings of the 35th International Conference on Software Engineering (ICSE’13). 1193–1196

  57. [58]

    Michael Rath, Jacob Rendall, Jin L. C. Guo, Jane Cleland-Huang, and Patrick Mäder. 2018. Traceability in the wild: automatically augmenting incomplete trace links. InProceedings of the 40th International Conference on Software Engineering(Gothenburg, Sweden)(ICSE ’18). Association for Computing Machinery, New York, NY, USA, 834–845. https://doi.org/10.114...

  58. [59]

    C. J. Van Rijsbergen. 1979.Information Retrieval(2nd ed.). Butterworth-Heinemann, USA

  59. [60]

    Tobias Roehm, Rebecca Tiarks, Rainer Koschke, and Walid Maalej. 2012. How do professional developers comprehend software?. In2012 34th International Conference on Software Engineering (ICSE). IEEE, 255–265

  60. [61]

    Benjamin Rogers, James Gung, Yechen Qiao, and Janet E. Burge. 2012. Exploring techniques for rationale extraction from existing documents. In Proceedings of the 34th International Conference on Software Engineering(Zurich, Switzerland)(ICSE ’12). IEEE Press, 1313–1316

  61. [62]

    Khadijah Al Safwan and Francisco Servant. 2019. Decomposing the rationale of code commits: the software developer’s perspective. InProceedings of the 2019 27th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering. ACM, 397–408. https://dl.acm.org/doi/10.1145/3338906.3338979

  62. [63]

    Pankajeshwara Nand Sharma, Bastin Tony Roy Savarimuthu, and Nigel Stanger. 2021. Extracting Rationale for Open Source Software Development Decisions — A Study of Python Email Archives. InProceedings of the IEEE/ACM 43rd International Conference on Software Engineering (ICSE’21). 1008–1019

  63. [64]

    Murphy, and Kris De Volder

    Jonathan Sillito, Gail C. Murphy, and Kris De Volder. 2008. Asking and Answering Questions during a Programming Change Task.IEEE Transactions on Software Engineering34, 4 (2008), 434–451

  64. [65]

    Adriana Meza Soria, Taylor Lopez, Elizabeth Seero, Negin Mashhadi, Emily Evans, Janet Burge, and André Van der Hoek. 2024. Characterizing software maintenance meetings: Information shared, discussion outcomes, and information captured. InProceedings of the 46th IEEE/ACM International Conference on Software Engineering. 1–13

  65. [66]

    Antony Tang, Muhammad Ali Babar, Ian Gorton, and Jun Han. 2006. A survey of architecture design rationale.Journal of systems and software79, 12 (2006), 1792–1804

  66. [67]

    Yida Tao, Yingnong Dang, Tao Xie, Dongmei Zhang, and Sunghun Kim. 2012. How do software engineers understand code changes? an exploratory study in industry. InProceedings of the ACM SIGSOFT 20th International Symposium on the Foundations of Software Engineering. Association for Manuscript submitted to ACM Fine-grained Multi-Document Extraction and Generat...

  67. [68]

    Yingchen Tian, Yuxia Zhang, Klaas-Jan Stol, Lin Jiang, and Hui Liu. 2022. What makes a good commit message?. InProceedings of the 44th International Conference on Software Engineering. 2389–2401

  68. [69]

    van Lamsweerde

    A. van Lamsweerde. 2001. Goal-oriented requirements engineering: a guided tour. InProceedings Fifth IEEE International Symposium on Requirements Engineering. 249–262. https://doi.org/10.1109/ISRE.2001.948567

  69. [70]

    Yifan Wu, Yunpeng Wang, Ying Li, Wei Tao, Siyu Yu, Haowen Yang, Wei Jiang, and Jianguo Li. 2025. An Empirical Study on Commit Message Generation using LLMs via In-Context Learning.arXiv preprint arXiv:2502.18904(2025)

  70. [71]

    Yuwei Zhang, Jayanth Srinivasa, Gaowen Liu, and Jingbo Shang. 2025. Attention reveals more than tokens: Training-free long-context reasoning with attention-guided retrieval.arXiv preprint arXiv:2503.09819(2025)

  71. [72]

    Jiuang Zhao, Zitian Yang, Li Zhang, Xiaoli Lian, Donghao Yang, and Xin Tan. 2024. DRMiner: Extracting Latent Design Rationale from Jira Issue Logs.2024 39th IEEE/ACM International Conference on Automated Software Engineering (ASE)(2024), 468–480

  72. [73]

    Xiyu Zhou, Ruiyin Li, Peng Liang, Beiqi Zhang, Mojtaba Shahin, Zengyang Li, and Chen Yang. 2025. Using LLMs in generating design rationale for software architecture decisions.ACM Transactions on Software Engineering and Methodology(2025)

  73. [74]

    Thomas Zimmermann, Rahul Premraj, Nicolas Bettenburg, Sascha Just, Adrian Schroter, and Cathrin Weiss. 2010. What makes a good bug report? IEEE Transactions on Software Engineering36, 5 (2010), 618–643. Manuscript submitted to ACM