H uman E val Pro and MBPP Pro: Evaluating Large Language Models on Self-invoking Code Generation Task

Yu, Zhaojian, Zhao, Yilun, Cohan, Arman, Zhang, Xiao-Ping · 2025 · DOI 10.18653/v1/2025.findings-acl.686

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

open at publisher browse 1 citing papers

representative citing papers

cs.AI · 2026-06-04 · unverdicted · novelty 5.0

LLMs trained via rubric-based self-rewarding RL with GRPO enhanced feeling expression and sycophancy robustness but degraded truthful QA performance.

Showing 1 of 1 citing paper.

When AI Says It Feels cs.AI · 2026-06-04 · unverdicted · none · ref 54
LLMs trained via rubric-based self-rewarding RL with GRPO enhanced feeling expression and sycophancy robustness but degraded truthful QA performance.