GradingAttack: Exposing Security Vulnerabilities in LLM Based Educational Grading Agents

Xueyi Li; Yongdong Wu; Zhuoneng Zhou; Zitao Liu

arxiv: 2602.00979 · v2 · pith:UXVMTN4Snew · submitted 2026-02-01 · 💻 cs.CR · cs.AI· cs.CL

GradingAttack: Exposing Security Vulnerabilities in LLM Based Educational Grading Agents

Xueyi Li , Zhuoneng Zhou , Zitao Liu , Yongdong Wu This is my paper

Pith reviewed 2026-05-25 06:59 UTC · model grok-4.3

classification 💻 cs.CR cs.AIcs.CL

keywords adversarial attackLLM securityeducational agentsautomatic gradingshort answer gradingprompt manipulationtoken-level attack

0 comments

The pith

Adversarial prompt and token changes can alter LLM grading outcomes with high success and stealth.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces GradingAttack to demonstrate how LLM-based agents for short-answer grading can be manipulated. It develops token-level and prompt-level strategies that change the grades assigned while keeping the input changes difficult to notice. Experiments across multiple datasets show both strategies succeed in compromising the agents, with prompt-level versions succeeding more often and token-level versions proving stealthier. This matters because such agents are already used to assess student work in educational settings, where manipulated grades could affect fairness and trust. The findings indicate that these systems currently have no strong built-in protections against targeted interference.

Core claim

GradingAttack is a fine-grained adversarial attack framework that systematically evaluates the security vulnerabilities of LLM based educational grading agents. It designs token-level and prompt-level attack strategies that manipulate agent grading outcomes while maintaining high stealth. Experiments on multiple datasets demonstrate that both attack strategies effectively compromise grading agents, with prompt-level attacks achieving higher success rates and token-level attacks exhibiting superior stealth capability. This reveals that current LLM based educational agents lack robust defenses against adversarial attacks.

What carries the argument

GradingAttack framework with token-level and prompt-level attack strategies that manipulate grading outcomes while maintaining high stealth.

If this is right

LLM grading agents can have their outputs changed by adversarial inputs.
Prompt-level attacks tend to succeed more frequently at altering grades.
Token-level attacks are harder for observers to detect.
Educational LLM agents require additional security measures to be trustworthy.
Automated grading carries risks from undetected manipulation in real use.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Developers could add adversarial testing using these attack types before releasing grading agents.
Similar manipulation risks may apply to other LLM agents making assessment or feedback decisions.
Detection methods focused on input pattern anomalies could reduce success of these attacks.
Defenses might need to be specific to short-answer grading rather than general LLM security.

Load-bearing premise

The tested LLM grading agents and datasets are representative of real-world educational deployments, and the attack success rates will hold when the agents are used in live classroom settings rather than controlled experiments.

What would settle it

A live deployment of an LLM grading agent in an actual classroom where attempted prompt-level and token-level attacks fail to change grades or are reliably detected by standard review processes.

Figures

Figures reproduced from arXiv: 2602.00979 by Xueyi Li, Yongdong Wu, Zhuoneng Zhou, Zitao Liu.

**Figure 2.** Figure 2: The overview of our GradingAttack framework. [PITH_FULL_IMAGE:figures/full_fig_p005_2.png] view at source ↗

**Figure 3.** Figure 3: Performance comparisons of token-level (dash line) and prompt-level attack methods on [PITH_FULL_IMAGE:figures/full_fig_p008_3.png] view at source ↗

**Figure 4.** Figure 4: The impact of attack on different labels. [PITH_FULL_IMAGE:figures/full_fig_p009_4.png] view at source ↗

**Figure 5.** Figure 5: Effect of role-play string placement on performance. R, S, and P represent the role-play strings, student answer and grading prompt, respectively, with their order indicating the relative placement. To further investigate the impact of role-play string placement on attack effectiveness in our GradingAttackRole method, we conduct experiments by varying the position of role-play strings in the adversarial … view at source ↗

**Figure 6.** Figure 6: An example prompt for LLM based ASAG tasks. [PITH_FULL_IMAGE:figures/full_fig_p014_6.png] view at source ↗

**Figure 7.** Figure 7: A complete grading process. 15 [PITH_FULL_IMAGE:figures/full_fig_p015_7.png] view at source ↗

read the original abstract

Large language models (LLMs) are increasingly deployed as educational agents for automatic short answer grading (ASAG) in real-world educational environments, significantly boosting assessment efficiency and scalability. However, when these grading agents operate ``in the wild'', their vulnerability to adversarial manipulation raises critical concerns about agent security and trustworthiness. In this paper, we introduce GradingAttack, a fine-grained adversarial attack framework that systematically evaluates the security vulnerabilities of LLM based educational grading agents. Specifically, we design token-level and prompt-level attack strategies that manipulate agent grading outcomes while maintaining high stealth, exposing fundamental weaknesses in current agent deployments. Experiments on multiple datasets demonstrate that both attack strategies effectively compromise grading agents, with prompt-level attacks achieving higher success rates and token-level attacks exhibiting superior stealth capability. Our findings reveal that current LLM based educational agents lack robust defenses against adversarial attacks, underscoring the urgent need for developing secure and trustworthy agent systems for critical educational applications.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

GradingAttack applies token- and prompt-level attacks to LLM grading agents but the abstract gives almost no experimental details, so the strength of the claims is hard to judge.

read the letter

The main thing to know is that this paper introduces GradingAttack, a framework that combines token-level and prompt-level adversarial strategies aimed at LLM-based short answer grading agents. It claims these attacks can change grades while staying stealthy, and that prompt-level versions succeed more often while token-level ones are harder to detect. The work is a direct application of existing adversarial ideas to the education setting rather than a new method from scratch. It does a reasonable job of spelling out why this matters for fairness and trust when these systems move into real classrooms. The framing around agent security in ASAG is straightforward and on target for the subfield. The soft spots are mostly around evidence. The abstract says experiments on multiple datasets show the attacks work, but it supplies no model names, no concrete success rates, no baselines, no metrics for stealth, and no discussion of how the attacks were constructed or evaluated. That makes it impossible to tell whether the results are robust or depend on weak prompts in the tested agents. The stress-test point about generalization is fair; if real deployments use chain-of-thought, few-shot examples, or fine-tuned graders, the reported effectiveness could drop. No circularity or invented math appears, and the citation pattern is not an issue since this is mostly empirical. This paper is for people working on LLM security or educational AI tools who want to see concrete attack examples in this domain. It shows honest engagement with the practical risks. I would send it to peer review so the full methods and results can be checked, even though the current version needs more detail to stand on its own.

Referee Report

2 major / 1 minor

Summary. The paper introduces GradingAttack, a fine-grained adversarial attack framework for LLM-based automatic short answer grading (ASAG) agents. It proposes token-level and prompt-level attack strategies that manipulate grading outcomes while aiming for high stealth, and claims that experiments on multiple datasets show both strategies effectively compromise the agents, with prompt-level attacks achieving higher success rates and token-level attacks exhibiting superior stealth.

Significance. If the empirical results hold with proper documentation, the work is significant for highlighting security risks in deployed educational AI systems, potentially motivating defenses for fairness and trustworthiness in automated assessment. The empirical attack framework, if reproducible with clear metrics and baselines, would contribute to the growing literature on LLM vulnerabilities in critical applications.

major comments (2)

[Abstract] Abstract: the claim that 'experiments on multiple datasets demonstrate that both attack strategies effectively compromise grading agents' provides no details on the LLMs tested, the datasets, the definition or computation of attack success rates, baselines, or any statistical significance tests. This absence makes it impossible to assess whether the data support the central claim.
[Abstract] Abstract: the reported success rates rest on the untested assumption that the specific grading agents (LLMs + prompts) are representative; there is no indication that experiments varied system prompts, incorporated chain-of-thought reasoning, few-shot examples, or fine-tuned models, which could sharply reduce attack effectiveness in real deployments.

minor comments (1)

[Abstract] The abstract refers to 'high stealth' and 'superior stealth capability' without defining the metric (e.g., detection rate by humans or other LLMs) or how it was measured.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback on the abstract. We address each major comment below and outline planned revisions to strengthen the presentation of our experimental claims.

read point-by-point responses

Referee: [Abstract] Abstract: the claim that 'experiments on multiple datasets demonstrate that both attack strategies effectively compromise grading agents' provides no details on the LLMs tested, the datasets, the definition or computation of attack success rates, baselines, or any statistical significance tests. This absence makes it impossible to assess whether the data support the central claim.

Authors: We agree that the abstract, as a high-level summary, omits these specifics. The manuscript body (Sections 3.2, 4.1, and 5) details the LLMs evaluated (GPT-3.5-Turbo, GPT-4, Llama-2-7B), the datasets (SciEntsBank, Beetle, and two additional ASAG corpora), the attack success rate metric (fraction of responses whose assigned grade is altered to the attacker-chosen target), the baseline comparisons (random token replacement and prompt paraphrasing), and the use of paired t-tests for significance. To improve accessibility, we will revise the abstract to include a concise clause summarizing the LLMs, datasets, and primary success-rate definition. revision: yes
Referee: [Abstract] Abstract: the reported success rates rest on the untested assumption that the specific grading agents (LLMs + prompts) are representative; there is no indication that experiments varied system prompts, incorporated chain-of-thought reasoning, few-shot examples, or fine-tuned models, which could sharply reduce attack effectiveness in real deployments.

Authors: This is a fair observation. Our experiments used standard zero-shot system prompts with the listed off-the-shelf models; we did not ablate system-prompt wording, add chain-of-thought, few-shot exemplars, or evaluate fine-tuned graders. We will add an explicit limitations paragraph acknowledging that more elaborate prompting or fine-tuning could increase robustness, and we will frame the current results as evidence of vulnerabilities in typical current deployments rather than claiming universal representativeness. revision: partial

Circularity Check

0 steps flagged

No circularity: empirical attack evaluation with no derivations or self-referential reductions

full rationale

The paper introduces GradingAttack as an empirical adversarial framework consisting of token-level and prompt-level strategies, evaluated via experiments on multiple datasets. No equations, derivations, fitted parameters presented as predictions, or self-citation chains appear in the provided abstract or described structure. The central claims rest on reported attack success rates from controlled experiments rather than any self-definitional or load-bearing reduction to inputs. This matches the default expectation for non-circular empirical security papers.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review; no explicit free parameters, axioms, or invented entities are described beyond the attack framework name itself.

pith-pipeline@v0.9.0 · 5695 in / 1052 out tokens · 24530 ms · 2026-05-25T06:59:09.081668+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

34 extracted references · 34 canonical work pages · 4 internal anchors

[1]

Rama Sree, and M

Sridevi Bonthu, S. Rama Sree, and M. H. M. Krishna Prasad. Automated short answer grading using deep learning: A survey. InProceedings of the International Cross-Domain Conference for Machine Learning and Knowledge Extraction, Virtual Event, August 2021

work page 2021
[2]

The eras and trends of automatic short answer grading

Steven Burrows, Iryna Gurevych, and Benno Stein. The eras and trends of automatic short answer grading. International Journal of Artificial Intelligence in Education, 25:60–117, 2015

work page 2015
[3]

Automatic short answer grading for finnish with chatgpt

Li-Hsin Chang and Filip Ginter. Automatic short answer grading for finnish with chatgpt. InProceedings of the AAAI Conference on Artificial Intelligence, Vancouver, Canada, March 2024

work page 2024
[4]

Using large language models for automated grading of student writing about science.International Journal of Artificial Intelligence in Education, pages 1–35, 2025

Impey Chris, Wenger Matthew, Garuda Nikhil, Golchin Shahriar, and Stamer Sarah. Using large language models for automated grading of student writing about science.International Journal of Artificial Intelligence in Education, pages 1–35, 2025

work page 2025
[5]

Training Verifiers to Solve Math Word Problems

Karl Cobbe, Vineet Kosaraju, Mohammad Bavarian, Mark Chen, Heewoo Jun, Lukasz Kaiser, Matthias Plappert, Jerry Tworek, Jacob Hilton, Reiichiro Nakano, et al. Training verifiers to solve math word problems.arXiv preprint arXiv:2110.14168, 2021

work page internal anchor Pith review Pith/arXiv arXiv 2021
[6]

Security and privacy challenges of large language models: A survey.ACM Computing Surveys, pages 1–34, 2025

Badhan Chandra Das, M Hadi Amini, and Yanzhao Wu. Security and privacy challenges of large language models: A survey.ACM Computing Surveys, pages 1–34, 2025

work page 2025
[7]

nswvt- nvakgxpm

Yuning Ding, Brian Riordan, Andrea Horbach, Aoife Cahill, and Torsten Zesch. Don’t take “nswvt- nvakgxpm” for an answer–the surprising vulnerability of automatic content scoring systems to adversarial input. InProceedings of the 28th International Conference on Computational Linguistics, Barcelona, Spain, December 2020

work page 2020
[8]

SemEval-2013 task 7: The joint student response analysis and 8th recognizing textual entailment challenge

Myroslava Dzikovska, Rodney Nielsen, Chris Brew, Claudia Leacock, Danilo Giampiccolo, Luisa Ben- tivogli, Peter Clark, Ido Dagan, and Hoa Trang Dang. SemEval-2013 task 7: The joint student response analysis and 8th recognizing textual entailment challenge. InProceedings of the 7th International Workshop on Semantic Evaluation, Atlanta, Georgia, USA, June 2013

work page 2013
[9]

Cheating automatic short answer grading with the adversarial usage of adjectives and adverbs.International Journal of Artificial Intelligence in Education, 34:616–646, 2024

Anna Filighera, Sebastian Ochs, Tim Steuer, and Thomas Tregel. Cheating automatic short answer grading with the adversarial usage of adjectives and adverbs.International Journal of Artificial Intelligence in Education, 34:616–646, 2024

work page 2024
[10]

Fooling automatic short answer grading systems

Anna Filighera, Tim Steuer, and Christoph Rensing. Fooling automatic short answer grading systems. In Proceedings of the 21st International Conference on Artificial Intelligence in Education, Ifrane, Morocco, July 2020. 10

work page 2020
[11]

Measuring mathematical problem solving with the math dataset

Dan Hendrycks, Collin Burns, Saurav Kadavath, Akul Arora, Steven Basart, Eric Tang, Dawn Song, and Jacob Steinhardt. Measuring mathematical problem solving with the math dataset. InProceedings of 34th Conference on Neural Information Processing Systems Datasets and Benchmarks Track, Virtual Event, December 2021

work page 2021
[12]

Cheating during the college years: How do business school students compare?Journal of Business Ethics, 72:197–206, 2007

Helen A Klein, Nancy M Levenburg, Marie McKendall, and William Mothersell. Cheating during the college years: How do business school students compare?Journal of Business Ethics, 72:197–206, 2007

work page 2007
[13]

A multilingual dataset of adversarial attacks to automatic content scoring systems

Ronja Laarmann-Quante, Christopher Chandler, Noemi Incirkus, Vitaliia Ruban, Alona Solopov, and Luca Steen. A multilingual dataset of adversarial attacks to automatic content scoring systems. InProceedings of the 20th Conference on Natural Language Processing, Vienna, Austria, September 2024

work page 2024
[14]

Mwptoolkit: An open-source framework for deep learning-based math word problem solvers

Yihuai Lan, Lei Wang, Qiyuan Zhang, Yunshi Lan, Bing Tian Dai, Yan Wang, Dongxiang Zhang, and Ee-Peng Lim. Mwptoolkit: An open-source framework for deep learning-based math word problem solvers. InProceedings of the 36th AAAI Conference on Artificial Intelligence, Virtual Event, February 2022

work page 2022
[15]

Advancing adversarial suffix transfer learning on aligned large language models

Hongfu Liu, Yuxi Xie, Ye Wang, and Michael Shieh. Advancing adversarial suffix transfer learning on aligned large language models. InProceedings of the Conference on Empirical Methods in Natural Language Processing, Miami, Florida, USA, November 2024

work page 2024
[16]

Autodan: Generating stealthy jailbreak prompts on aligned large language models

Xiaogeng Liu, Nan Xu, Muhao Chen, and Chaowei Xiao. Autodan: Generating stealthy jailbreak prompts on aligned large language models. InProceedings of the 12th International Conference on Learning Representations, Vienna, Austria, May 2024

work page 2024
[17]

Jailbreaking ChatGPT via Prompt Engineering: An Empirical Study

Yi Liu, Gelei Deng, Zhengzi Xu, Yuekang Li, Yaowen Zheng, Ying Zhang, Lida Zhao, Tianwei Zhang, Kailong Wang, and Yang Liu. Jailbreaking chatgpt via prompt engineering: An empirical study.arXiv preprint arXiv:2305.13860, 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023
[18]

Comuniqa : Exploring large language models for improving speaking skills

Manas Mhasakar, Shikhar Sharma, Apurv Mehra, Utkarsh Venaik, Ujjwal Singhal, Dhruv Kumar, and Kashish Mittal. Comuniqa : Exploring large language models for improving speaking skills. InProceedings of the 7th ACM SIGCAS/SIGCHI Conference of Computing and Sustainable Societies, New Delhi, India, July 2024

work page 2024
[19]

A survey on deep learning-based automated essay scoring and feedback generation.Artificial Intelligence Review, 58:1–40, 2025

Haile Misgna, Byung-Won On, Ingyu Lee, and Gyu Sang Choi. A survey on deep learning-based automated essay scoring and feedback generation.Artificial Intelligence Review, 58:1–40, 2025

work page 2025
[20]

Autotutor meets large language models: A language model tutor with rich pedagogy and guardrails

Sankalan Pal Chowdhury, Vilém Zouhar, and Mrinmaya Sachan. Autotutor meets large language models: A language model tutor with rich pedagogy and guardrails. InProceedings of the 11th ACM Conference on Learning @ Scale, New York, NY , USA, July 2024

work page 2024
[21]

Embeddings for automatic short answer grading: A scoping review.IEEE Transactions on Learning Technologies, 16:219–231, 2023

Marko Putnikovic and Jelena Jovanovic. Embeddings for automatic short answer grading: A scoping review.IEEE Transactions on Learning Technologies, 16:219–231, 2023

work page 2023
[22]

Abscribe: Rapid exploration & organization of multiple writing variations in human-ai co-writing tasks using large language models

Mohi Reza, Nathan M Laundry, Ilya Musabirov, Peter Dushniku, Zhi Yuan “Michael” Yu, Kashish Mittal, Tovi Grossman, Michael Liut, Anastasia Kuzminykh, and Joseph Jay Williams. Abscribe: Rapid exploration & organization of multiple writing variations in human-ai co-writing tasks using large language models. InProceedings of the 2024 CHI Conference on Human ...

work page 2024
[23]

Enhancing short answer grading with openai apis

Sebastian Speiser and Annegret Weng. Enhancing short answer grading with openai apis. InProceedings of the 21st International Conference on Information Technology Based Higher Education and Training, Paris, France, November 2024

work page 2024
[24]

Unimodal regularisation based on beta distribution for deep ordinal regression.Pattern Recognition, 122:1–10, February 2022

Pedro Antonio Gutiérrez Víctor Manuel Vargas and César Hervás-Martínez. Unimodal regularisation based on beta distribution for deep ordinal regression.Pattern Recognition, 122:1–10, February 2022

work page 2022
[25]

Jailbreak and guard aligned language models with only few in-context demonstrations.arXiv preprint arXiv:2310.06387, 2023

Zeming Wei, Yifei Wang, and Yisen Wang. Jailbreak and guard aligned language models with only few in-context demonstrations.arXiv preprint arXiv:2310.06387, 2023

work page arXiv 2023
[26]

Factors associated with cheating among college students: A review.Research in Higher Education, 39:235–274, 1998

Bernard E Whitley. Factors associated with cheating among college students: A review.Research in Higher Education, 39:235–274, 1998

work page 1998
[27]

An llm can fool itself: A prompt-based adversarial attack.arXiv preprint arXiv:2310.13345, 2023

Xilie Xu, Keyi Kong, Ning Liu, Lizhen Cui, Di Wang, Jingfeng Zhang, and Mohan Kankanhalli. An llm can fool itself: A prompt-based adversarial attack.arXiv preprint arXiv:2310.13345, 2023

work page arXiv 2023
[28]

Evaluating the Performance of Large Language Models on GAOKAO Benchmark

Xiaotian Zhang, Chunyang Li, Yi Zong, Zhengyu Ying, Liang He, and Xipeng Qiu. Evaluating the performance of large language models on gaokao benchmark.arXiv preprint arXiv:2305.12474, 2023. 11

work page internal anchor Pith review Pith/arXiv arXiv 2023
[29]

Boosting jailbreak attack with momentum

Yihao Zhang and Zeming Wei. Boosting jailbreak attack with momentum. InProceedings of the ICLR 2024 Workshop on Reliable and Responsible Foundation Models, Vienna, Austria, May 2024

work page 2024
[30]

A survey of recent backdoor attacks and defenses in large language models.Transactions on Machine Learning Research, pages 1–28, 2025

Shuai Zhao, Meihuizi Jia, Zhongliang Guo, Leilei Gan, XIAOYU XU, Xiaobao Wu, Jie Fu, Feng Yichao, Fengjun Pan, and Anh Tuan Luu. A survey of recent backdoor attacks and defenses in large language models.Transactions on Machine Learning Research, pages 1–28, 2025

work page 2025
[31]

Universal vulnerabilities in large language models: Backdoor attacks for in-context learning

Shuai Zhao, Meihuizi Jia, Luu Anh Tuan, Fengjun Pan, and Jinming Wen. Universal vulnerabilities in large language models: Backdoor attacks for in-context learning. InProceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, Miami, Florida, USA, November 2024

work page 2024
[32]

Can llm replace stack overflow? a study on robustness and reliability of large language model code generation

Li Zhong and Zilong Wang. Can llm replace stack overflow? a study on robustness and reliability of large language model code generation. InProceedings of the 38th AAAI Conference on Artificial Intelligence, Vancouver, Canada, February 2024

work page 2024
[33]

Virtual context enhancing jailbreak attacks with special token injection

Yuqi Zhou, Lin Lu, Ryan Sun, Pan Zhou, and Lichao Sun. Virtual context enhancing jailbreak attacks with special token injection. InFindings of the Association for Computational Linguistics: EMNLP 2024, Miami, Florida, USA, November 2024

work page 2024
[34]

Universal and Transferable Adversarial Attacks on Aligned Language Models

Andy Zou, Zifan Wang, Nicholas Carlini, Milad Nasr, J Zico Kolter, and Matt Fredrikson. Universal and transferable adversarial attacks on aligned language models.arXiv preprint arXiv:2307.15043, 2023. 12 A Effectiveness of CAS Metric Evaluating adversarial attack performance often relies solely on the ASR. However, this metric alone is insufficient to ref...

work page internal anchor Pith review Pith/arXiv arXiv 2023

[1] [1]

Rama Sree, and M

Sridevi Bonthu, S. Rama Sree, and M. H. M. Krishna Prasad. Automated short answer grading using deep learning: A survey. InProceedings of the International Cross-Domain Conference for Machine Learning and Knowledge Extraction, Virtual Event, August 2021

work page 2021

[2] [2]

The eras and trends of automatic short answer grading

Steven Burrows, Iryna Gurevych, and Benno Stein. The eras and trends of automatic short answer grading. International Journal of Artificial Intelligence in Education, 25:60–117, 2015

work page 2015

[3] [3]

Automatic short answer grading for finnish with chatgpt

Li-Hsin Chang and Filip Ginter. Automatic short answer grading for finnish with chatgpt. InProceedings of the AAAI Conference on Artificial Intelligence, Vancouver, Canada, March 2024

work page 2024

[4] [4]

Using large language models for automated grading of student writing about science.International Journal of Artificial Intelligence in Education, pages 1–35, 2025

Impey Chris, Wenger Matthew, Garuda Nikhil, Golchin Shahriar, and Stamer Sarah. Using large language models for automated grading of student writing about science.International Journal of Artificial Intelligence in Education, pages 1–35, 2025

work page 2025

[5] [5]

Training Verifiers to Solve Math Word Problems

Karl Cobbe, Vineet Kosaraju, Mohammad Bavarian, Mark Chen, Heewoo Jun, Lukasz Kaiser, Matthias Plappert, Jerry Tworek, Jacob Hilton, Reiichiro Nakano, et al. Training verifiers to solve math word problems.arXiv preprint arXiv:2110.14168, 2021

work page internal anchor Pith review Pith/arXiv arXiv 2021

[6] [6]

Security and privacy challenges of large language models: A survey.ACM Computing Surveys, pages 1–34, 2025

Badhan Chandra Das, M Hadi Amini, and Yanzhao Wu. Security and privacy challenges of large language models: A survey.ACM Computing Surveys, pages 1–34, 2025

work page 2025

[7] [7]

nswvt- nvakgxpm

Yuning Ding, Brian Riordan, Andrea Horbach, Aoife Cahill, and Torsten Zesch. Don’t take “nswvt- nvakgxpm” for an answer–the surprising vulnerability of automatic content scoring systems to adversarial input. InProceedings of the 28th International Conference on Computational Linguistics, Barcelona, Spain, December 2020

work page 2020

[8] [8]

SemEval-2013 task 7: The joint student response analysis and 8th recognizing textual entailment challenge

Myroslava Dzikovska, Rodney Nielsen, Chris Brew, Claudia Leacock, Danilo Giampiccolo, Luisa Ben- tivogli, Peter Clark, Ido Dagan, and Hoa Trang Dang. SemEval-2013 task 7: The joint student response analysis and 8th recognizing textual entailment challenge. InProceedings of the 7th International Workshop on Semantic Evaluation, Atlanta, Georgia, USA, June 2013

work page 2013

[9] [9]

Cheating automatic short answer grading with the adversarial usage of adjectives and adverbs.International Journal of Artificial Intelligence in Education, 34:616–646, 2024

Anna Filighera, Sebastian Ochs, Tim Steuer, and Thomas Tregel. Cheating automatic short answer grading with the adversarial usage of adjectives and adverbs.International Journal of Artificial Intelligence in Education, 34:616–646, 2024

work page 2024

[10] [10]

Fooling automatic short answer grading systems

Anna Filighera, Tim Steuer, and Christoph Rensing. Fooling automatic short answer grading systems. In Proceedings of the 21st International Conference on Artificial Intelligence in Education, Ifrane, Morocco, July 2020. 10

work page 2020

[11] [11]

Measuring mathematical problem solving with the math dataset

Dan Hendrycks, Collin Burns, Saurav Kadavath, Akul Arora, Steven Basart, Eric Tang, Dawn Song, and Jacob Steinhardt. Measuring mathematical problem solving with the math dataset. InProceedings of 34th Conference on Neural Information Processing Systems Datasets and Benchmarks Track, Virtual Event, December 2021

work page 2021

[12] [12]

Cheating during the college years: How do business school students compare?Journal of Business Ethics, 72:197–206, 2007

Helen A Klein, Nancy M Levenburg, Marie McKendall, and William Mothersell. Cheating during the college years: How do business school students compare?Journal of Business Ethics, 72:197–206, 2007

work page 2007

[13] [13]

A multilingual dataset of adversarial attacks to automatic content scoring systems

Ronja Laarmann-Quante, Christopher Chandler, Noemi Incirkus, Vitaliia Ruban, Alona Solopov, and Luca Steen. A multilingual dataset of adversarial attacks to automatic content scoring systems. InProceedings of the 20th Conference on Natural Language Processing, Vienna, Austria, September 2024

work page 2024

[14] [14]

Mwptoolkit: An open-source framework for deep learning-based math word problem solvers

Yihuai Lan, Lei Wang, Qiyuan Zhang, Yunshi Lan, Bing Tian Dai, Yan Wang, Dongxiang Zhang, and Ee-Peng Lim. Mwptoolkit: An open-source framework for deep learning-based math word problem solvers. InProceedings of the 36th AAAI Conference on Artificial Intelligence, Virtual Event, February 2022

work page 2022

[15] [15]

Advancing adversarial suffix transfer learning on aligned large language models

Hongfu Liu, Yuxi Xie, Ye Wang, and Michael Shieh. Advancing adversarial suffix transfer learning on aligned large language models. InProceedings of the Conference on Empirical Methods in Natural Language Processing, Miami, Florida, USA, November 2024

work page 2024

[16] [16]

Autodan: Generating stealthy jailbreak prompts on aligned large language models

Xiaogeng Liu, Nan Xu, Muhao Chen, and Chaowei Xiao. Autodan: Generating stealthy jailbreak prompts on aligned large language models. InProceedings of the 12th International Conference on Learning Representations, Vienna, Austria, May 2024

work page 2024

[17] [17]

Jailbreaking ChatGPT via Prompt Engineering: An Empirical Study

Yi Liu, Gelei Deng, Zhengzi Xu, Yuekang Li, Yaowen Zheng, Ying Zhang, Lida Zhao, Tianwei Zhang, Kailong Wang, and Yang Liu. Jailbreaking chatgpt via prompt engineering: An empirical study.arXiv preprint arXiv:2305.13860, 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023

[18] [18]

Comuniqa : Exploring large language models for improving speaking skills

Manas Mhasakar, Shikhar Sharma, Apurv Mehra, Utkarsh Venaik, Ujjwal Singhal, Dhruv Kumar, and Kashish Mittal. Comuniqa : Exploring large language models for improving speaking skills. InProceedings of the 7th ACM SIGCAS/SIGCHI Conference of Computing and Sustainable Societies, New Delhi, India, July 2024

work page 2024

[19] [19]

A survey on deep learning-based automated essay scoring and feedback generation.Artificial Intelligence Review, 58:1–40, 2025

Haile Misgna, Byung-Won On, Ingyu Lee, and Gyu Sang Choi. A survey on deep learning-based automated essay scoring and feedback generation.Artificial Intelligence Review, 58:1–40, 2025

work page 2025

[20] [20]

Autotutor meets large language models: A language model tutor with rich pedagogy and guardrails

Sankalan Pal Chowdhury, Vilém Zouhar, and Mrinmaya Sachan. Autotutor meets large language models: A language model tutor with rich pedagogy and guardrails. InProceedings of the 11th ACM Conference on Learning @ Scale, New York, NY , USA, July 2024

work page 2024

[21] [21]

Embeddings for automatic short answer grading: A scoping review.IEEE Transactions on Learning Technologies, 16:219–231, 2023

Marko Putnikovic and Jelena Jovanovic. Embeddings for automatic short answer grading: A scoping review.IEEE Transactions on Learning Technologies, 16:219–231, 2023

work page 2023

[22] [22]

Abscribe: Rapid exploration & organization of multiple writing variations in human-ai co-writing tasks using large language models

Mohi Reza, Nathan M Laundry, Ilya Musabirov, Peter Dushniku, Zhi Yuan “Michael” Yu, Kashish Mittal, Tovi Grossman, Michael Liut, Anastasia Kuzminykh, and Joseph Jay Williams. Abscribe: Rapid exploration & organization of multiple writing variations in human-ai co-writing tasks using large language models. InProceedings of the 2024 CHI Conference on Human ...

work page 2024

[23] [23]

Enhancing short answer grading with openai apis

Sebastian Speiser and Annegret Weng. Enhancing short answer grading with openai apis. InProceedings of the 21st International Conference on Information Technology Based Higher Education and Training, Paris, France, November 2024

work page 2024

[24] [24]

Unimodal regularisation based on beta distribution for deep ordinal regression.Pattern Recognition, 122:1–10, February 2022

Pedro Antonio Gutiérrez Víctor Manuel Vargas and César Hervás-Martínez. Unimodal regularisation based on beta distribution for deep ordinal regression.Pattern Recognition, 122:1–10, February 2022

work page 2022

[25] [25]

Jailbreak and guard aligned language models with only few in-context demonstrations.arXiv preprint arXiv:2310.06387, 2023

Zeming Wei, Yifei Wang, and Yisen Wang. Jailbreak and guard aligned language models with only few in-context demonstrations.arXiv preprint arXiv:2310.06387, 2023

work page arXiv 2023

[26] [26]

Factors associated with cheating among college students: A review.Research in Higher Education, 39:235–274, 1998

Bernard E Whitley. Factors associated with cheating among college students: A review.Research in Higher Education, 39:235–274, 1998

work page 1998

[27] [27]

An llm can fool itself: A prompt-based adversarial attack.arXiv preprint arXiv:2310.13345, 2023

Xilie Xu, Keyi Kong, Ning Liu, Lizhen Cui, Di Wang, Jingfeng Zhang, and Mohan Kankanhalli. An llm can fool itself: A prompt-based adversarial attack.arXiv preprint arXiv:2310.13345, 2023

work page arXiv 2023

[28] [28]

Evaluating the Performance of Large Language Models on GAOKAO Benchmark

Xiaotian Zhang, Chunyang Li, Yi Zong, Zhengyu Ying, Liang He, and Xipeng Qiu. Evaluating the performance of large language models on gaokao benchmark.arXiv preprint arXiv:2305.12474, 2023. 11

work page internal anchor Pith review Pith/arXiv arXiv 2023

[29] [29]

Boosting jailbreak attack with momentum

Yihao Zhang and Zeming Wei. Boosting jailbreak attack with momentum. InProceedings of the ICLR 2024 Workshop on Reliable and Responsible Foundation Models, Vienna, Austria, May 2024

work page 2024

[30] [30]

A survey of recent backdoor attacks and defenses in large language models.Transactions on Machine Learning Research, pages 1–28, 2025

Shuai Zhao, Meihuizi Jia, Zhongliang Guo, Leilei Gan, XIAOYU XU, Xiaobao Wu, Jie Fu, Feng Yichao, Fengjun Pan, and Anh Tuan Luu. A survey of recent backdoor attacks and defenses in large language models.Transactions on Machine Learning Research, pages 1–28, 2025

work page 2025

[31] [31]

Universal vulnerabilities in large language models: Backdoor attacks for in-context learning

Shuai Zhao, Meihuizi Jia, Luu Anh Tuan, Fengjun Pan, and Jinming Wen. Universal vulnerabilities in large language models: Backdoor attacks for in-context learning. InProceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, Miami, Florida, USA, November 2024

work page 2024

[32] [32]

Can llm replace stack overflow? a study on robustness and reliability of large language model code generation

Li Zhong and Zilong Wang. Can llm replace stack overflow? a study on robustness and reliability of large language model code generation. InProceedings of the 38th AAAI Conference on Artificial Intelligence, Vancouver, Canada, February 2024

work page 2024

[33] [33]

Virtual context enhancing jailbreak attacks with special token injection

Yuqi Zhou, Lin Lu, Ryan Sun, Pan Zhou, and Lichao Sun. Virtual context enhancing jailbreak attacks with special token injection. InFindings of the Association for Computational Linguistics: EMNLP 2024, Miami, Florida, USA, November 2024

work page 2024

[34] [34]

Universal and Transferable Adversarial Attacks on Aligned Language Models

Andy Zou, Zifan Wang, Nicholas Carlini, Milad Nasr, J Zico Kolter, and Matt Fredrikson. Universal and transferable adversarial attacks on aligned language models.arXiv preprint arXiv:2307.15043, 2023. 12 A Effectiveness of CAS Metric Evaluating adversarial attack performance often relies solely on the ASR. However, this metric alone is insufficient to ref...

work page internal anchor Pith review Pith/arXiv arXiv 2023