Assessing Language Models for Salient Class Identification

Bo Xiong; Chaoran Cai; Chong Wang; Kaipeng Xiong; Peng Liang

arxiv: 2606.21629 · v1 · pith:NYUFP47Mnew · submitted 2026-06-19 · 💻 cs.SE

Assessing Language Models for Salient Class Identification

Bo Xiong , Chaoran Cai , Kaipeng Xiong , Chong Wang , Peng Liang This is my paper

Pith reviewed 2026-06-26 13:27 UTC · model grok-4.3

classification 💻 cs.SE

keywords salient class identificationlanguage modelscode reviewcommit analysissoftware engineeringprompting strategiesJava commitsdataset construction

0 comments

The pith

Language models identify salient classes in commits directly from text and outperform program-analysis baselines.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper asks whether language models can pick out the main classes changed in a commit without AST parsing, dependency graphs, or any training. It builds ApacheJavaCM, a dataset of 7,911 Apache Java commits containing 25,914 labeled classes, then tests GPT-5.4, DeepSeek-V3.2, and the 9B Qwen3.5-9B model under zero-shot, few-shot, and chain-of-thought prompting. The models beat the strongest reproducible baseline across commit sizes and types, and the small open-source model reaches parity with the large closed-source one when given a few examples. This matters for code review because it gives reviewers a direct entry point into complex changes without custom tooling.

Core claim

Language models prompted on raw commit text can identify salient classes in multi-class Java commits without feature engineering, graph construction, or model training, substantially outperforming the strongest reproducible state-of-the-art baseline while remaining stable across commit characteristics; a 9B-parameter open-source model under few-shot prompting matches the performance of a much larger closed-source model.

What carries the argument

Direct prompting of language models on commit messages and diffs to classify each modified class as salient or non-salient.

If this is right

Reviewers receive an immediate starting point when a commit touches many classes.
Salient-class identification no longer requires AST parsing or handcrafted features.
A 9B open-source model suffices, lowering both monetary cost and data-privacy exposure compared with large closed models.
Performance holds steady across varying commit sizes and message lengths.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same prompting approach could be tested on non-Java languages to check language independence.
Integration into review platforms could automatically highlight salient classes in the diff view.
Few-shot examples drawn from the target project might further improve accuracy without retraining.

Load-bearing premise

The labels in the ApacheJavaCM dataset correctly mark which classes are the salient ones driving the commit changes.

What would settle it

Independent experts re-labeling a random sample of commits and finding systematic disagreement with the original labels on more than a small fraction of cases.

Figures

Figures reproduced from arXiv: 2606.21629 by Bo Xiong, Chaoran Cai, Chong Wang, Kaipeng Xiong, Peng Liang.

read the original abstract

Code review requires reviewers to understand the core intent of code changes, which becomes difficult when a commit modifies multiple classes. In such commits, one or more primarily modified classes, referred to as salient classes, may induce modifications in other classes. Accurate identification of salient classes offers reviewers an effective entry point to navigate code changes and facilitates program comprehension. Existing state-of-the-art approaches rely on complex program-analysis procedures, including Abstract Syntax Tree (AST) parsing, class relation extraction, handcrafted feature engineering, or dependency graph construction. To this end, we study whether language models (LMs) can identify salient classes directly from commits without feature engineering, graph construction, or training. We first construct a new dataset ApacheJavaCM, derived from the ApacheCM dataset, containing 7,911 commits and 25,914 labeled classes. On this dataset, we systematically evaluate whether LMs can identify salient classes directly from commits and compare with the strongest reproducible state-of-the-art (SOTA) baseline. The evaluation covers two large language models (LLMs), GPT-5.4 and DeepSeek-V3.2, one small language model (SLM), Qwen3.5-9B, and three prompting strategies: zero-shot, few-shot, and chain-of-thought. The LMs substantially outperform the baseline while remaining stable across commit characteristics and selected LMs. We also found that, for salient class identification tasks, a 9B-parameter open-source SLM, Qwen3.5-9B, under few-shot prompting, achieves performance comparable to that of a much larger closed-source LLM, GPT-5.4. These results suggest that lightweight, locally deployable SLMs are sufficient for industrial salient class identification tasks and can reduce both cost and privacy barriers associated with relying on closed-source LLMs.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

LMs beat the program-analysis baseline on salient class ID using a new dataset, but everything rests on unverified label quality in ApacheJavaCM.

read the letter

The paper's main move is to drop the usual AST parsing, dependency graphs, and feature engineering and just feed commit text to language models for salient class identification. They built ApacheJavaCM (7,911 commits, 25,914 labels) from ApacheCM and tested GPT-5.4, DeepSeek-V3.2, and Qwen3.5-9B under zero-shot, few-shot, and chain-of-thought prompting. The LMs beat the strongest reproducible baseline and hold steady across commit characteristics, with the 9B open model matching the much larger closed one under few-shot.

That direct LM approach and the new dataset are the actual novelties. Prior work in the area relied on heavy program analysis, so skipping it is a practical simplification. The prompting comparison is systematic and the stability check across commit types is a useful addition.

The load-bearing assumption is the dataset labels. The abstract says the data is derived from ApacheCM but gives no operational definition of salience, no derivation steps, no validation against the original labels, and no inter-annotator numbers. If labeling errors correlate with commit size or class count, the outperformance and stability claims become hard to trust. The abstract also withholds the actual metrics, which makes the size of the gains impossible to judge from the summary alone.

This is for software engineering researchers who want to apply LMs to code review without building full program-analysis pipelines. It has enough new data and a clear empirical question to go to peer review, though referees will need to see the label construction details and the numbers before the results can be taken as settled.

Referee Report

2 major / 1 minor

Summary. The paper claims that language models can identify salient classes in multi-class commits directly from commit text without program analysis, feature engineering, or training. They construct ApacheJavaCM (7,911 commits, 25,914 classes derived from ApacheCM), evaluate GPT-5.4, DeepSeek-V3.2, and Qwen3.5-9B under zero-shot, few-shot, and chain-of-thought prompting against the strongest reproducible SOTA baseline, and report that LMs substantially outperform the baseline while remaining stable across commit characteristics, with the 9B SLM under few-shot matching GPT-5.4 performance.

Significance. If the dataset labels prove reliable, the work would demonstrate that small open-source LMs suffice for a practical code-review assistance task, lowering cost and privacy barriers compared with large closed-source models. The multi-model, multi-prompting evaluation and stability analysis across commit characteristics would strengthen the case for deployable SE tools.

major comments (2)

[Dataset construction] Dataset construction section: The ApacheJavaCM labels are described only as 'derived from ApacheCM' with no operational definition of salience, derivation procedure, commit selection criteria, or validation (inter-annotator agreement, comparison to original ApacheCM labels). All performance claims, including outperformance and stability, rest on these 25,914 labels being accurate unbiased ground truth; without this evidence the evaluation results cannot be assessed.
[Evaluation / Results] Evaluation and results sections: The abstract and main text report outperformance and stability but supply no concrete metrics, error bars, baseline reproduction details, or statistical tests. This prevents verification of the magnitude of gains and the cross-characteristic stability claim.

minor comments (1)

Example prompts for the three strategies (zero-shot, few-shot, CoT) would improve reproducibility; consider adding them to an appendix.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback. We address each major comment below and commit to revisions that will improve the verifiability of the work.

read point-by-point responses

Referee: [Dataset construction] Dataset construction section: The ApacheJavaCM labels are described only as 'derived from ApacheCM' with no operational definition of salience, derivation procedure, commit selection criteria, or validation (inter-annotator agreement, comparison to original ApacheCM labels). All performance claims, including outperformance and stability, rest on these 25,914 labels being accurate unbiased ground truth; without this evidence the evaluation results cannot be assessed.

Authors: We agree that the current manuscript provides insufficient detail on dataset construction. In the revised version we will expand the relevant section to explicitly state the operational definition of salience from ApacheCM, the precise derivation steps used to create ApacheJavaCM (including commit selection criteria), and any validation statistics available from the source dataset. This will allow readers to assess label reliability directly. revision: yes
Referee: [Evaluation / Results] Evaluation and results sections: The abstract and main text report outperformance and stability but supply no concrete metrics, error bars, baseline reproduction details, or statistical tests. This prevents verification of the magnitude of gains and the cross-characteristic stability claim.

Authors: We acknowledge the absence of concrete numerical results, error bars, baseline reproduction details, and statistical tests in the submitted manuscript. In revision we will add a dedicated results subsection containing the full performance metrics (precision, recall, F1), error bars or confidence intervals, explicit baseline reproduction protocol, and statistical significance tests supporting both the outperformance claims and the stability analysis across commit characteristics. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical evaluation on external dataset labels against reproducible baseline

full rationale

The paper constructs ApacheJavaCM from ApacheCM, then measures LM performance (zero/few-shot/CoT prompting on GPT-5.4, DeepSeek-V3.2, Qwen3.5-9B) directly against the 25,914 provided labels and a SOTA baseline. No equations, fitted parameters renamed as predictions, self-definitional steps, or load-bearing self-citations appear in the derivation. Claims reduce to standard comparison of model outputs vs. held-out labels, with no reduction by construction to the inputs themselves. The label-accuracy assumption is a validity concern, not a circularity mechanism.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on the validity of the derived dataset labels and the assumption that the tested prompting strategies and model selections are representative of practical use; no free parameters or invented entities are introduced.

axioms (1)

domain assumption The ApacheCM dataset provides a reliable base for deriving accurate salient class labels in ApacheJavaCM
The paper derives the new dataset from it without detailing validation of label accuracy or selection criteria.

pith-pipeline@v0.9.1-grok · 5867 in / 1374 out tokens · 48777 ms · 2026-06-26T13:27:48.129943+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

40 extracted references

[1]

Alberto Bacchelli and Christian Bird. 2013. Expectations, Outcomes, and Challenges of Modern Code Review. In Proceedings of the 35th International Conference on Software Engineering (ICSE). IEEE, 712–721

2013
[2]

Mike Barnett, Christian Bird, João Brunet, and Shuvendu K. Lahiri. 2015. Helping Developers Help Themselves: Automatic Decomposition of Code Review Changesets. InProceedings of the 37th International Conference on Software Engineering (ICSE). ACM, 134–144

2015
[3]

Olga Baysal, Oleksii Kononenko, Reid Holmes, and Michael W. Godfrey. 2016. Investigating Technical and Non- Technical Factors Influencing Modern Code Review.Empirical Software Engineering21, 3 (2016), 932–959

2016
[4]

Carver, Christian Bird, Jonathan Orbeck, and Christopher Chockley

Amiangshu Bosu, Jeffrey C. Carver, Christian Bird, Jonathan Orbeck, and Christopher Chockley. 2017. Process Aspects and Social Dynamics of Contemporary Code Review: Insights from Open Source Development and Industrial Practice at Microsoft.IEEE Transactions on Software Engineering43, 1 (2017), 56–75

2017
[5]

CACM Staff. 2019. CodeFlow: Improving the Code Review Process at Microsoft.Commun. ACM62, 2 (2019), 36–44

2019
[6]

Davide Chicco and Giuseppe Jurman. 2020. The Advantages of the Matthews Correlation Coefficient (MCC) over F1 Score and Accuracy in Binary Classification Evaluation.BMC Genomics21 (2020), 6

2020
[7]

Giuseppe Crupi, Rosalia Tufano, and Gabriele Bavota. 2026. Improving Code Generation via Small Language Model- as-a-judge. InProceedings of the 48th IEEE/ACM International Conference on Software Engineering (ICSE). ACM, 1–12

2026
[8]

Martin Dias, Alberto Bacchelli, Georgios Gousios, Damien Cassou, and Stephane Ducasse. 2015. Untangling Fine- Grained Code Changes. InProceedings of the 22nd IEEE International Conference on Software Analysis, Evolution, and Reengineering (SANER). IEEE, 341–350

2015
[9]

Jinhao Dong, Yiling Lou, Dan Hao, and Lin Tan. 2023. Revisiting Learning-Based Commit Message Generation. In Proceedings of the 45th IEEE/ACM International Conference on Software Engineering (ICSE). IEEE, 794–805

2023
[10]

Jinhao Dong, Yiling Lou, Qihao Zhu, Zeyu Sun, Zhilin Li, Wenjie Zhang, and Dan Hao. 2022. FIRA: Fine-Grained Graph-Based Code Change Representation for Automated Commit Message Generation. InProceedings of the 44th IEEE/ACM International Conference on Software Engineering (ICSE). IEEE, 970–981

2022
[11]

Sidong Feng and Chunyang Chen. 2024. Prompting Is All You Need: Automated Android Bug Replay with Large Language Models. InProceedings of the 46th IEEE/ACM International Conference on Software Engineering (ICSE). IEEE, 803–815

2024
[12]

Zhangyin Feng, Daya Guo, Duyu Tang, Nan Duan, Xiaocheng Feng, Ming Gong, Linjun Shou, Bing Qin, Ting Liu, Daxin Jiang, and Ming Zhou. 2020. CodeBERT: A Pre-Trained Model for Programming and Natural Languages. In Proceedings of the 25th Conference on Empirical Methods in Natural Language Processing (EMNLP): Findings. ACL, 1536–1547. Assessing Language Mode...

2020
[13]

Beat Fluri, Michael Würsch, Martin Pinzger, and Harald C. Gall. 2007. Change Distilling: Tree Differencing for Fine-Grained Source Code Change Extraction.IEEE Transactions on Software Engineering33, 11 (2007), 725–743

2007
[14]

Gerrit. 2026. Gerrit Code Review: Software Documentation. https://www.gerritcodereview.com/

2026
[15]

Md Mahade Hasan, Muhammad Waseem, Kai-Kristian Kemell, Jussi Rasku, Juha Ala-Rantala, and Pekka Abrahamsson
[16]

Assessing Small Language Models for Code Generation: An Empirical Study with Benchmarks.Journal of Systems and Software236 (2026), 112815

2026
[17]

Hassan and Richard C

Ahmed E. Hassan and Richard C. Holt. 2004. Predicting Change Propagation in Software Systems. InProceedings of the 20th IEEE International Conference on Software Maintenance (ICSM). IEEE, 284–293

2004
[18]

Hattori and Michele Lanza

Lile P. Hattori and Michele Lanza. 2008. On the Nature of Commits. InProceedings of the 23rd IEEE/ACM International Conference on Automated Software Engineering Workshops (ASEW). IEEE, 63–71

2008
[19]

Yuan Huang, Xiangping Chen, Zhiyong Liu, Xiaonan Luo, and Zibin Zheng. 2017. Using Discriminative Feature in Software Entities for Relevance Identification of Code Changes.Journal of Software: Evolution and Process29, 7 (2017), e1859

2017
[20]

Yuan Huang, Nan Jia, Xiangping Chen, Kai Hong, and Zibin Zheng. 2018. Salient-Class Location: Help Developers Understand Code Change in Code Review. InProceedings of the Joint Meeting of the European Software Engineering Conference and ACM SIGSOFT Symposium on the Foundations of Software Engineering (ESEC/FSE). ACM, 770–774

2018
[21]

Yuan Huang, Nan Jia, Xiangping Chen, Kai Hong, and Zibin Zheng. 2022. Code Review Knowledge Perception: Fusing Multi-Features for Salient-Class Location.IEEE Transactions on Software Engineering48, 5 (2022), 1463–1479

2022
[22]

Yuan Huang, Jinyu Jiang, Xiapu Luo, Xiangping Chen, Zibin Zheng, Nan Jia, and Gang Huang. 2022. Change-Patterns Mapping: A Boosting Way for Change Impact Analysis.IEEE Transactions on Software Engineering48, 7 (2022), 2376–2398

2022
[23]

Yuan Huang, Zhicao Tang, Xiangping Chen, Changlin Yang, Zibin Zheng, and Xiaocong Zhou. 2026. Commit Messages Generation Based on Core Changes.ACM Transactions on Software Engineering and Methodology35, 5 (2026), 1–32

2026
[24]

Siyuan Jiang, Ameer Armaly, and Collin McMillan. 2017. Automatically Generating Commit Messages from Diffs Using Neural Machine Translation. InProceedings of the 32nd IEEE/ACM International Conference on Automated Software Engineering (ASE). IEEE, 135–146

2017
[25]

Oleksii Kononenko, Olga Baysal, and Michael W. Godfrey. 2016. Code Review Quality: How Developers See It. In Proceedings of the 38th IEEE/ACM International Conference on Software Engineering (ICSE). IEEE, 1028–1038

2016
[26]

Oleksii Kononenko, Olga Baysal, Latifa Guerrouj, Yaxin Cao, and Michael W. Godfrey. 2015. Investigating Code Review Quality: Do People and Participation Matter?. InProceedings of the 31st IEEE International Conference on Software Maintenance and Evolution (ICSME). IEEE, 111–120

2015
[27]

Shane McIntosh, Yasutaka Kamei, Bram Adams, and Ahmed E. Hassan. 2016. An Empirical Study of the Impact of Modern Code Review Practices on Software Quality.Empirical Software Engineering21, 5 (2016), 2146–2189

2016
[28]

Jiahao Ren, Jianming Chang, Lulu Wang, Zaixing Zhang, and Bixin Li. 2024. Graph-Based Salient Class Classification in Commits. InProceedings of the 24th IEEE International Conference on Software Quality, Reliability and Security (QRS). IEEE, 620–631

2024
[29]

Rigby and Christian Bird

Peter C. Rigby and Christian Bird. 2013. Convergent Contemporary Software Peer Review Practices. InProceedings of the Joint Meeting of the European Software Engineering Conference and ACM SIGSOFT Symposium on the Foundations of Software Engineering (ESEC/FSE). ACM, 202–212

2013
[30]

Rigby, Daniel M

Peter C. Rigby, Daniel M. German, and Margaret-Anne D. Storey. 2008. Open Source Software Peer Review Practices: A Case Study of the Apache Server. InProceedings of the 30th International Conference on Software Engineering (ICSE). ACM, 541–550

2008
[31]

Caitlin Sadowski, Emma Söderberg, Luke Church, Michal Sipko, and Alberto Bacchelli. 2018. Modern Code Review: A Case Study at Google. InProceedings of the 40th International Conference on Software Engineering: Software Engineering in Practice (ICSE-SEIP). ACM, 181–190

2018
[32]

Yida Tao, Yingnong Dang, Tao Xie, Dongmei Zhang, and Sunghun Kim. 2012. How Do Software Engineers Understand Code Changes? An Exploratory Study in Industry. InProceedings of the 20th ACM SIGSOFT International Symposium on the Foundations of Software Engineering (FSE). ACM, 1–11

2012
[33]

Patanamon Thongtanunam, Shane McIntosh, Ahmed E Hassan, and Hajimu Iida. 2017. Review participation in modern code review: An empirical study of the android, Qt, and OpenStack projects.Empirical Software Engineering22, 2 (2017), 768–817

2017
[34]

Bo Xiong, Chaoran Cai, Chong Wang, and Peng Liang. 2026. Replication Package for the Paper: Assessing Language Models for Salient Class Identification. https://github.com/riverBag/LLM4SalientClass

2026
[35]

Bo Xiong, Linghao Zhang, Zongen Ren, Chong Wang, and Peng Liang. 2026. CoRaCMG: Contextual Retrieval- Augmented Framework for Commit Message Generation.Information and Software Technology196 (2026), 108169

2026
[36]

Bo Xiong, Linghao Zhang, Chong Wang, and Peng Liang. 2025. Contextual Code Retrieval for Commit Message Generation: A Preliminary Study. InProceedings of the 19th ACM/IEEE International Symposium on Empirical Software 22 Xiong et al. Engineering and Measurement (ESEM). IEEE, 358–364

2025
[37]

Kaiyan Zhang, Jianyu Wang, Ermo Hua, Biqing Qi, Ning Ding, and Bowen Zhou. 2024. CoGenesis: A Framework Collaborating Large and Small Language Models for Secure Context-Aware Instruction Following. InProceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (ACL). ACL, 4295–4312

2024
[38]

Qingyu Zhang, Puzhuo Liu, Peng Di, and Chenxiong Qian. 2025. CodeFuse-CommitEval: Towards Benchmarking LLM’s Power on Commit Message and Code Change Inconsistency Detection.arXiv preprint arXiv:2511.19875(2025)

arXiv 2025
[39]

Tianyi Zhang, Myoungkyu Song, Jorge Pinedo, and Miryung Kim. 2015. Interactive Code Review for Systematic Changes. InProceedings of the 37th IEEE/ACM International Conference on Software Engineering (ICSE). IEEE, 111–122

2015
[40]

Yuxia Zhang, Zhiqing Qiu, Klaas-Jan Stol, Wenhui Zhu, Jiaxin Zhu, Yingchen Tian, and Hui Liu. 2024. Automatic Commit Message Generation: A Critical Review and Directions for Future Work.IEEE Transactions on Software Engineering50, 4 (2024), 816–835

2024

[1] [1]

Alberto Bacchelli and Christian Bird. 2013. Expectations, Outcomes, and Challenges of Modern Code Review. In Proceedings of the 35th International Conference on Software Engineering (ICSE). IEEE, 712–721

2013

[2] [2]

Mike Barnett, Christian Bird, João Brunet, and Shuvendu K. Lahiri. 2015. Helping Developers Help Themselves: Automatic Decomposition of Code Review Changesets. InProceedings of the 37th International Conference on Software Engineering (ICSE). ACM, 134–144

2015

[3] [3]

Olga Baysal, Oleksii Kononenko, Reid Holmes, and Michael W. Godfrey. 2016. Investigating Technical and Non- Technical Factors Influencing Modern Code Review.Empirical Software Engineering21, 3 (2016), 932–959

2016

[4] [4]

Carver, Christian Bird, Jonathan Orbeck, and Christopher Chockley

Amiangshu Bosu, Jeffrey C. Carver, Christian Bird, Jonathan Orbeck, and Christopher Chockley. 2017. Process Aspects and Social Dynamics of Contemporary Code Review: Insights from Open Source Development and Industrial Practice at Microsoft.IEEE Transactions on Software Engineering43, 1 (2017), 56–75

2017

[5] [5]

CACM Staff. 2019. CodeFlow: Improving the Code Review Process at Microsoft.Commun. ACM62, 2 (2019), 36–44

2019

[6] [6]

Davide Chicco and Giuseppe Jurman. 2020. The Advantages of the Matthews Correlation Coefficient (MCC) over F1 Score and Accuracy in Binary Classification Evaluation.BMC Genomics21 (2020), 6

2020

[7] [7]

Giuseppe Crupi, Rosalia Tufano, and Gabriele Bavota. 2026. Improving Code Generation via Small Language Model- as-a-judge. InProceedings of the 48th IEEE/ACM International Conference on Software Engineering (ICSE). ACM, 1–12

2026

[8] [8]

Martin Dias, Alberto Bacchelli, Georgios Gousios, Damien Cassou, and Stephane Ducasse. 2015. Untangling Fine- Grained Code Changes. InProceedings of the 22nd IEEE International Conference on Software Analysis, Evolution, and Reengineering (SANER). IEEE, 341–350

2015

[9] [9]

Jinhao Dong, Yiling Lou, Dan Hao, and Lin Tan. 2023. Revisiting Learning-Based Commit Message Generation. In Proceedings of the 45th IEEE/ACM International Conference on Software Engineering (ICSE). IEEE, 794–805

2023

[10] [10]

Jinhao Dong, Yiling Lou, Qihao Zhu, Zeyu Sun, Zhilin Li, Wenjie Zhang, and Dan Hao. 2022. FIRA: Fine-Grained Graph-Based Code Change Representation for Automated Commit Message Generation. InProceedings of the 44th IEEE/ACM International Conference on Software Engineering (ICSE). IEEE, 970–981

2022

[11] [11]

Sidong Feng and Chunyang Chen. 2024. Prompting Is All You Need: Automated Android Bug Replay with Large Language Models. InProceedings of the 46th IEEE/ACM International Conference on Software Engineering (ICSE). IEEE, 803–815

2024

[12] [12]

Zhangyin Feng, Daya Guo, Duyu Tang, Nan Duan, Xiaocheng Feng, Ming Gong, Linjun Shou, Bing Qin, Ting Liu, Daxin Jiang, and Ming Zhou. 2020. CodeBERT: A Pre-Trained Model for Programming and Natural Languages. In Proceedings of the 25th Conference on Empirical Methods in Natural Language Processing (EMNLP): Findings. ACL, 1536–1547. Assessing Language Mode...

2020

[13] [13]

Beat Fluri, Michael Würsch, Martin Pinzger, and Harald C. Gall. 2007. Change Distilling: Tree Differencing for Fine-Grained Source Code Change Extraction.IEEE Transactions on Software Engineering33, 11 (2007), 725–743

2007

[14] [14]

Gerrit. 2026. Gerrit Code Review: Software Documentation. https://www.gerritcodereview.com/

2026

[15] [15]

Md Mahade Hasan, Muhammad Waseem, Kai-Kristian Kemell, Jussi Rasku, Juha Ala-Rantala, and Pekka Abrahamsson

[16] [16]

Assessing Small Language Models for Code Generation: An Empirical Study with Benchmarks.Journal of Systems and Software236 (2026), 112815

2026

[17] [17]

Hassan and Richard C

Ahmed E. Hassan and Richard C. Holt. 2004. Predicting Change Propagation in Software Systems. InProceedings of the 20th IEEE International Conference on Software Maintenance (ICSM). IEEE, 284–293

2004

[18] [18]

Hattori and Michele Lanza

Lile P. Hattori and Michele Lanza. 2008. On the Nature of Commits. InProceedings of the 23rd IEEE/ACM International Conference on Automated Software Engineering Workshops (ASEW). IEEE, 63–71

2008

[19] [19]

Yuan Huang, Xiangping Chen, Zhiyong Liu, Xiaonan Luo, and Zibin Zheng. 2017. Using Discriminative Feature in Software Entities for Relevance Identification of Code Changes.Journal of Software: Evolution and Process29, 7 (2017), e1859

2017

[20] [20]

Yuan Huang, Nan Jia, Xiangping Chen, Kai Hong, and Zibin Zheng. 2018. Salient-Class Location: Help Developers Understand Code Change in Code Review. InProceedings of the Joint Meeting of the European Software Engineering Conference and ACM SIGSOFT Symposium on the Foundations of Software Engineering (ESEC/FSE). ACM, 770–774

2018

[21] [21]

Yuan Huang, Nan Jia, Xiangping Chen, Kai Hong, and Zibin Zheng. 2022. Code Review Knowledge Perception: Fusing Multi-Features for Salient-Class Location.IEEE Transactions on Software Engineering48, 5 (2022), 1463–1479

2022

[22] [22]

Yuan Huang, Jinyu Jiang, Xiapu Luo, Xiangping Chen, Zibin Zheng, Nan Jia, and Gang Huang. 2022. Change-Patterns Mapping: A Boosting Way for Change Impact Analysis.IEEE Transactions on Software Engineering48, 7 (2022), 2376–2398

2022

[23] [23]

Yuan Huang, Zhicao Tang, Xiangping Chen, Changlin Yang, Zibin Zheng, and Xiaocong Zhou. 2026. Commit Messages Generation Based on Core Changes.ACM Transactions on Software Engineering and Methodology35, 5 (2026), 1–32

2026

[24] [24]

Siyuan Jiang, Ameer Armaly, and Collin McMillan. 2017. Automatically Generating Commit Messages from Diffs Using Neural Machine Translation. InProceedings of the 32nd IEEE/ACM International Conference on Automated Software Engineering (ASE). IEEE, 135–146

2017

[25] [25]

Oleksii Kononenko, Olga Baysal, and Michael W. Godfrey. 2016. Code Review Quality: How Developers See It. In Proceedings of the 38th IEEE/ACM International Conference on Software Engineering (ICSE). IEEE, 1028–1038

2016

[26] [26]

Oleksii Kononenko, Olga Baysal, Latifa Guerrouj, Yaxin Cao, and Michael W. Godfrey. 2015. Investigating Code Review Quality: Do People and Participation Matter?. InProceedings of the 31st IEEE International Conference on Software Maintenance and Evolution (ICSME). IEEE, 111–120

2015

[27] [27]

Shane McIntosh, Yasutaka Kamei, Bram Adams, and Ahmed E. Hassan. 2016. An Empirical Study of the Impact of Modern Code Review Practices on Software Quality.Empirical Software Engineering21, 5 (2016), 2146–2189

2016

[28] [28]

Jiahao Ren, Jianming Chang, Lulu Wang, Zaixing Zhang, and Bixin Li. 2024. Graph-Based Salient Class Classification in Commits. InProceedings of the 24th IEEE International Conference on Software Quality, Reliability and Security (QRS). IEEE, 620–631

2024

[29] [29]

Rigby and Christian Bird

Peter C. Rigby and Christian Bird. 2013. Convergent Contemporary Software Peer Review Practices. InProceedings of the Joint Meeting of the European Software Engineering Conference and ACM SIGSOFT Symposium on the Foundations of Software Engineering (ESEC/FSE). ACM, 202–212

2013

[30] [30]

Rigby, Daniel M

Peter C. Rigby, Daniel M. German, and Margaret-Anne D. Storey. 2008. Open Source Software Peer Review Practices: A Case Study of the Apache Server. InProceedings of the 30th International Conference on Software Engineering (ICSE). ACM, 541–550

2008

[31] [31]

Caitlin Sadowski, Emma Söderberg, Luke Church, Michal Sipko, and Alberto Bacchelli. 2018. Modern Code Review: A Case Study at Google. InProceedings of the 40th International Conference on Software Engineering: Software Engineering in Practice (ICSE-SEIP). ACM, 181–190

2018

[32] [32]

Yida Tao, Yingnong Dang, Tao Xie, Dongmei Zhang, and Sunghun Kim. 2012. How Do Software Engineers Understand Code Changes? An Exploratory Study in Industry. InProceedings of the 20th ACM SIGSOFT International Symposium on the Foundations of Software Engineering (FSE). ACM, 1–11

2012

[33] [33]

Patanamon Thongtanunam, Shane McIntosh, Ahmed E Hassan, and Hajimu Iida. 2017. Review participation in modern code review: An empirical study of the android, Qt, and OpenStack projects.Empirical Software Engineering22, 2 (2017), 768–817

2017

[34] [34]

Bo Xiong, Chaoran Cai, Chong Wang, and Peng Liang. 2026. Replication Package for the Paper: Assessing Language Models for Salient Class Identification. https://github.com/riverBag/LLM4SalientClass

2026

[35] [35]

Bo Xiong, Linghao Zhang, Zongen Ren, Chong Wang, and Peng Liang. 2026. CoRaCMG: Contextual Retrieval- Augmented Framework for Commit Message Generation.Information and Software Technology196 (2026), 108169

2026

[36] [36]

Bo Xiong, Linghao Zhang, Chong Wang, and Peng Liang. 2025. Contextual Code Retrieval for Commit Message Generation: A Preliminary Study. InProceedings of the 19th ACM/IEEE International Symposium on Empirical Software 22 Xiong et al. Engineering and Measurement (ESEM). IEEE, 358–364

2025

[37] [37]

Kaiyan Zhang, Jianyu Wang, Ermo Hua, Biqing Qi, Ning Ding, and Bowen Zhou. 2024. CoGenesis: A Framework Collaborating Large and Small Language Models for Secure Context-Aware Instruction Following. InProceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (ACL). ACL, 4295–4312

2024

[38] [38]

Qingyu Zhang, Puzhuo Liu, Peng Di, and Chenxiong Qian. 2025. CodeFuse-CommitEval: Towards Benchmarking LLM’s Power on Commit Message and Code Change Inconsistency Detection.arXiv preprint arXiv:2511.19875(2025)

arXiv 2025

[39] [39]

Tianyi Zhang, Myoungkyu Song, Jorge Pinedo, and Miryung Kim. 2015. Interactive Code Review for Systematic Changes. InProceedings of the 37th IEEE/ACM International Conference on Software Engineering (ICSE). IEEE, 111–122

2015

[40] [40]

Yuxia Zhang, Zhiqing Qiu, Klaas-Jan Stol, Wenhui Zhu, Jiaxin Zhu, Yingchen Tian, and Hui Liu. 2024. Automatic Commit Message Generation: A Critical Review and Directions for Future Work.IEEE Transactions on Software Engineering50, 4 (2024), 816–835

2024