pith. machine review for the scientific record. sign in

arxiv: 2505.17056 · v2 · submitted 2025-05-17 · 💻 cs.CL · cs.AI

Recognition: unknown

From Test-taking to Cognitive Scaffolding: A Pedagogical Diagnostic Benchmark for LLMs on English Standardized Tests

Authors on Pith no claims yet
classification 💻 cs.CL cs.AI
keywords cognitiveframeworkdiagnosticpedagogicalreasoningstandardizedbenchmarkenglish
0
0 comments X
read the original abstract

As large language models (LLMs) are increasingly integrated into educational tools, current evaluations on standardized tests predominantly focus on binary outcome accuracy. Instead, an effective AI tutor must exhibit faithful reasoning, elucidate solution strategies, and diagnose specific human misconceptions. To bridge this gap, we introduce a pedagogical diagnostic framework that models English Standardized Test (EST) problem-solving as a traversal through a cognitive framework. Based on this framework, we present ESTBook, a multimodal benchmark encompassing 10,576 questions and 29 task types across five major exams. Unlike traditional datasets, ESTBook goes beyond data aggregation by enriching questions with formalized reasoning trajectories and distractor rationales that capture specific cognitive traps. Through extensive evaluations, we empirically demonstrate the practical utility of our diagnostic framework, showing that identifying cognitive trajectories facilitates the mitigation of performance gap and improves pedagogical reasoning through guided elicitation.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. From Planning to Revision: How AI Writing Support at Different Stages Alters Ownership

    cs.HC 2026-04 unverdicted novelty 6.0

    AI support during drafting decreases writing ownership more than during planning due to greater AI text and idea contributions, while improving essay quality.