Yu, Qiang Yang, and Xing Xie

Yupeng Chang, Xu Wang, Jindong Wang, Yuan Wu, Linyi Yang, Kaijie Zhu, Hao Chen, Xiaoyuan Yi, Cunxiang Wang, Yidong Wang, Wei Ye, Yue Zhang, Yi Chang, Philip S · 2023

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it

browse 2 citing papers

citation-role summary

background 1

citation-polarity summary

background 1

representative citing papers

Understanding the Limits of Automated Evaluation for Code Review Bots in Practice

cs.SE · 2026-04-27 · unverdicted · novelty 5.0

Automated LLM-based evaluation of code review bot comments achieves only moderate agreement (0.44-0.62) with developer labels in an industrial dataset because developer decisions reflect contextual constraints beyond comment quality.

TrustLLM: Trustworthiness in Large Language Models

cs.CL · 2024-01-10 · unverdicted · novelty 5.0

TrustLLM defines eight trustworthiness principles, creates a six-dimension benchmark, and evaluates 16 LLMs showing proprietary models generally lead but some open-source ones are close while over-calibration can hurt utility.

citing papers explorer

Showing 2 of 2 citing papers.

Understanding the Limits of Automated Evaluation for Code Review Bots in Practice cs.SE · 2026-04-27 · unverdicted · none · ref 39
Automated LLM-based evaluation of code review bot comments achieves only moderate agreement (0.44-0.62) with developer labels in an industrial dataset because developer decisions reflect contextual constraints beyond comment quality.
TrustLLM: Trustworthiness in Large Language Models cs.CL · 2024-01-10 · unverdicted · none · ref 98
TrustLLM defines eight trustworthiness principles, creates a six-dimension benchmark, and evaluates 16 LLMs showing proprietary models generally lead but some open-source ones are close while over-calibration can hurt utility.

Yu, Qiang Yang, and Xing Xie

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer