ACM Computing Surveys , volume=

A survey on uncertainty quantification of large language models: Taxonomy, open research challenges, future directions , author= · 2025

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it

browse 2 citing papers

representative citing papers

ProcCtrlBench: Evaluating Process-Level Defects and Control Preservation in LLM Coding Agents

cs.SE · 2026-05-18 · unverdicted · novelty 7.0 · 2 refs

ProcCtrlBench introduces an ontology of 11 defect types across 4 categories plus control preservation metrics to evaluate LLM coding agent trajectories on 200 cases from AndroidBench, TerminalBench, and SWE-bench-Verified.

Estimating LLM Grading Ability and Response Difficulty in Automatic Short Answer Grading via Item Response Theory

cs.CL · 2026-04-30 · unverdicted · novelty 7.0 · 2 refs

Item response theory applied to 17 LLMs on SciEntsBank and Beetle reveals that models with similar overall scores differ sharply in robustness to difficult responses, with errors clustering on partial-credit labels.

citing papers explorer

Showing 1 of 1 citing paper after filters.

ProcCtrlBench: Evaluating Process-Level Defects and Control Preservation in LLM Coding Agents cs.SE · 2026-05-18 · unverdicted · none · ref 23 · 2 links
ProcCtrlBench introduces an ontology of 11 defect types across 4 categories plus control preservation metrics to evaluate LLM coding agent trajectories on 200 cases from AndroidBench, TerminalBench, and SWE-bench-Verified.

ACM Computing Surveys , volume=

fields

years

verdicts

representative citing papers

citing papers explorer