McKee, Daniel Gillick, et al

Irina Jurenka, Markus Kunesch, Kevin R McKee, Daniel Gillick, Shaojian Zhu, Sara Wiltberger, Shubham Milind Phal, Katherine Hermann, Daniel Kasenberg, Avishkar Bhoopchand, et al · 2024 · arXiv 2407.12687

5 Pith papers cite this work. Polarity classification is still indexing.

5 Pith papers citing it

read on arXiv browse 5 citing papers

representative citing papers

Evaluating Answer Leakage Robustness of LLM Tutors against Adversarial Student Attacks

cs.CR · 2026-04-20 · unverdicted · novelty 7.0

LLM tutors leak answers under adversarial student attacks, but a fine-tuned jailbreak agent and simple defenses can benchmark and improve robustness.

The Missing Evaluation Axis: What 10,000 Student Submissions Reveal About AI Tutor Effectiveness

cs.CY · 2026-05-07 · conditional · novelty 6.0

Behavioral signals from how students use AI tutor feedback in 10k code submissions reveal differences between tutors and correlate more strongly with perceived helpfulness than pedagogical quality alone.

Beyond the AI Tutor: Social Learning with LLM Agents

cs.HC · 2026-04-03 · unverdicted · novelty 6.0

Two controlled experiments show multi-agent LLM configurations with both tutors and peers deliver higher learning gains and less homogeneous outputs than single-LLM tutoring in math problem-solving and essay writing.

Ceci n'est pas une explication: Evaluating Explanation Failures as Explainability Pitfalls in Language Learning Systems

cs.HC · 2026-04-28 · unverdicted · novelty 4.0

AI explanations in language learning often fail across six dimensions like diagnostic accuracy and self-regulation support, creating hidden risks that demand better evaluation frameworks such as L2-Bench.

Latency and Cost of Multi-Agent Intelligent Tutoring at Scale

cs.CY · 2026-04-27 · unverdicted · novelty 3.0

Priority PayGo keeps multi-agent tutoring responses under 4 seconds even at 50 concurrent users, while costs stay below textbook prices per student.

citing papers explorer

Showing 5 of 5 citing papers.

Evaluating Answer Leakage Robustness of LLM Tutors against Adversarial Student Attacks cs.CR · 2026-04-20 · unverdicted · none · ref 69
LLM tutors leak answers under adversarial student attacks, but a fine-tuned jailbreak agent and simple defenses can benchmark and improve robustness.
The Missing Evaluation Axis: What 10,000 Student Submissions Reveal About AI Tutor Effectiveness cs.CY · 2026-05-07 · conditional · none · ref 7
Behavioral signals from how students use AI tutor feedback in 10k code submissions reveal differences between tutors and correlate more strongly with perceived helpfulness than pedagogical quality alone.
Beyond the AI Tutor: Social Learning with LLM Agents cs.HC · 2026-04-03 · unverdicted · none · ref 39
Two controlled experiments show multi-agent LLM configurations with both tutors and peers deliver higher learning gains and less homogeneous outputs than single-LLM tutoring in math problem-solving and essay writing.
Ceci n'est pas une explication: Evaluating Explanation Failures as Explainability Pitfalls in Language Learning Systems cs.HC · 2026-04-28 · unverdicted · none · ref 19
AI explanations in language learning often fail across six dimensions like diagnostic accuracy and self-regulation support, creating hidden risks that demand better evaluation frameworks such as L2-Bench.
Latency and Cost of Multi-Agent Intelligent Tutoring at Scale cs.CY · 2026-04-27 · unverdicted · none · ref 24
Priority PayGo keeps multi-agent tutoring responses under 4 seconds even at 50 concurrent users, while costs stay below textbook prices per student.

McKee, Daniel Gillick, et al

fields

years

verdicts

representative citing papers

citing papers explorer