S em E val-2013 Task 7: The Joint Student Response Analysis and 8th Recognizing Textual Entailment Challenge

Dzikovska, Myroslava, Nielsen, Rodney, Brew, Chris, Leacock, Claudia, Giampiccolo, Danilo, Bentivogli, Luisa · 2013

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it

browse 2 citing papers

representative citing papers

Estimating LLM Grading Ability and Response Difficulty in Automatic Short Answer Grading via Item Response Theory

cs.CL · 2026-04-30 · unverdicted · novelty 7.0 · 2 refs

Item response theory applied to 17 LLMs on SciEntsBank and Beetle reveals that models with similar overall scores differ sharply in robustness to difficult responses, with errors clustering on partial-credit labels.

When LLMs Stop Following Steps: A Diagnostic Study of Procedural Execution in Language Models

cs.CL · 2026-05-01 · unverdicted · novelty 6.0

A new benchmark shows LLM first-answer accuracy on procedural arithmetic drops from 63% (5 steps) to 20% (95 steps) due to execution failures like skipped steps and premature answers.

citing papers explorer

Showing 2 of 2 citing papers.

Estimating LLM Grading Ability and Response Difficulty in Automatic Short Answer Grading via Item Response Theory cs.CL · 2026-04-30 · unverdicted · none · ref 7 · 2 links
Item response theory applied to 17 LLMs on SciEntsBank and Beetle reveals that models with similar overall scores differ sharply in robustness to difficult responses, with errors clustering on partial-credit labels.
When LLMs Stop Following Steps: A Diagnostic Study of Procedural Execution in Language Models cs.CL · 2026-05-01 · unverdicted · none · ref 160
A new benchmark shows LLM first-answer accuracy on procedural arithmetic drops from 63% (5 steps) to 20% (95 steps) due to execution failures like skipped steps and premature answers.

S em E val-2013 Task 7: The Joint Student Response Analysis and 8th Recognizing Textual Entailment Challenge

fields

years

verdicts

representative citing papers

citing papers explorer