arxiv: 2604.17569 · v1 · submitted 2026-04-19 · 💻 cs.CL

Recognition: unknown

MAPLE: A Meta-learning Framework for Cross-Prompt Essay Scoring

Salam Albatarni , May Bashendy , Sohaila Eltanbouly , Tamer Elsayed

Authors on Pith no claims yet

Pith reviewed 2026-05-10 05:53 UTC · model grok-4.3

classification 💻 cs.CL

keywords automated essay scoringmeta-learningprototypical networkscross-prompt evaluationtransferable representationsquadratic weighted kappaELLIPSELAILA

0 comments

The pith

MAPLE meta-learning framework improves cross-prompt automated essay scoring

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces MAPLE as a way to handle the problem of scoring essays written for prompts the model has not seen before. Automated essay scoring models typically struggle when the writing prompt changes because the style, topic, and grading standards differ. MAPLE addresses this by using meta-learning with prototypical networks to build representations that transfer across prompts. This leads to better performance on new prompts, as shown by strong results on two of the three tested datasets.

Core claim

MAPLE is a meta-learning framework that leverages prototypical networks to learn transferable representations across different writing prompts. On the ELLIPSE and LAILA datasets, it achieves state-of-the-art performance, outperforming strong baselines by 8.5 and 3 points in QWK. On the ASAP dataset with heterogeneous score ranges, it provides improvements on several traits, demonstrating its utility in unified scoring settings.

What carries the argument

Prototypical networks within the MAPLE meta-learning framework, which learn prompt-agnostic essay representations for generalization to unseen prompts.

Load-bearing premise

That the meta-learned representations from prototypical networks will generalize across prompts without major changes in writing style or scoring criteria.

What would settle it

A test on prompts with markedly different topics and rubrics where MAPLE fails to outperform conventional fine-tuned models in QWK score.

Figures

Figures reproduced from arXiv: 2604.17569 by May Bashendy, Salam Albatarni, Sohaila Eltanbouly, Tamer Elsayed.

**Figure 1.** Figure 1: MAPLE task generation. In meta-training, we explore two settings, binary and multiclass classification. Each sampled task includes a support set (for Ci computation) and a query set (for evaluation/learnerupdate). In meta-testing, the task is multiclass, where the support set includes all training data and the query set corresponds to an unseen prompt. Finally, the model is updated based on its performa… view at source ↗

**Figure 2.** Figure 2: Overview of MAPLE showing (a) the metalearner architecture and (b) the prediction process. 4.1 Meta-training Task Formulation Cross-prompt AES aims to train a model on essays from a set of source prompts P s = {p s i } and test its generalization capability on essays from an unseen target prompt p t . A direct adaptation of crossprompt AES into the meta-learning framework is to frame each prompt p s i as… view at source ↗

read the original abstract

Automated Essay Scoring (AES) faces significant challenges in cross-prompt settings, where models must generalize to unseen writing prompts. To address this limitation, we propose MAPLE, a meta-learning framework that leverages prototypical networks to learn transferable representations across different writing prompts. Across three diverse datasets (ELLIPSE and ASAP (English), and LAILA (Arabic)), MAPLE achieves state-of-the-art performance on ELLIPSE and LAILA, outperforming strong baselines by 8.5 and 3 points in QWK, respectively. On ASAP, where prompts exhibit heterogeneous score ranges, MAPLE yields improvements on several traits, highlighting the strengths of our approach in unified scoring settings. Overall, our results demonstrate the potential of meta-learning for building robust cross-prompt AES systems.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

MAPLE combines meta-learning and prototypical networks for cross-prompt AES with reported gains on two datasets, but the regression adaptation leaves the prompt-invariance claim under-supported.

read the letter

MAPLE takes established meta-learning and prototypical network ideas and applies them to cross-prompt automated essay scoring. The main result is empirical gains on ELLIPSE and LAILA, with smaller or mixed results on ASAP where score ranges differ across prompts. That is the core of the paper: a practical combination rather than a new theoretical mechanism. The work does well by including an Arabic dataset and by acknowledging the heterogeneous score ranges on ASAP, which is a realistic complication in this domain. Testing on multiple languages and noting trait-level results on ASAP shows some attention to the actual problem setting. The experiments appear to use standard QWK as the metric, which fits the task. The paper is straightforward about framing the problem as one of transferable representations across prompts. The soft spot is the adaptation step. Prototypical networks are built for classification with class centroids and nearest-centroid decisions. AES is regression or ordinal, and score distributions shift by prompt. The paper must convert distances or prototypes into numeric scores somehow, yet the abstract and reported claims do not make clear whether that conversion stays prompt-agnostic or reintroduces prompt-specific bias in the final layer. If the latter, the 8.5 and 3 point QWK lifts could partly reflect dataset alignment rather than robust transfer. The stress-test note on this point holds up from what is described; more explicit checks for invariance under score-range shifts would strengthen the central argument. No load-bearing circularity or invented entities appear in the setup. This paper is for people working on automated scoring in education technology or on meta-learning for domain shift in NLP. Readers who need concrete baselines and multi-dataset results on cross-prompt AES will get usable information from it. It is coherent on its own terms and shows honest engagement with the literature on the problem, so it deserves a serious referee even if the adaptation details require tightening. I would send it to peer review.

Referee Report

2 major / 0 minor

Summary. The manuscript proposes MAPLE, a meta-learning framework that uses prototypical networks to learn transferable representations for cross-prompt automated essay scoring. It reports state-of-the-art results on the ELLIPSE and LAILA datasets, outperforming baselines by 8.5 and 3 QWK points respectively, with additional improvements noted on the ASAP dataset for certain traits.

Significance. If the central claims hold, MAPLE could represent a meaningful advance in handling prompt variability in AES by leveraging meta-learning for prompt-agnostic representations. This has potential implications for building more robust scoring systems across diverse writing tasks and languages. However, the lack of detailed experimental validation in the abstract and the noted mismatch between prototypical networks and regression tasks raises questions about the generalizability of the approach.

major comments (2)

Abstract: The abstract reports empirical gains of 8.5 and 3 QWK points on ELLIPSE and LAILA but provides no details on experimental setup, baselines, statistical significance, error bars, or data splits. This absence makes the central performance claims impossible to verify from the given text.
Abstract / Methods: Prototypical networks are designed for classification tasks using class centroids and distance-based prediction. AES is a regression/ordinal task with prompt-specific score ranges and distributions (as noted for ASAP). The paper does not detail the adaptation (e.g., how distance to prototypes is mapped to numeric scores) or demonstrate that this preserves prompt-invariance under score-range shifts, which is load-bearing for the cross-prompt generalization claim.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive and insightful comments on our manuscript. We address each major comment below and have revised the manuscript to incorporate the suggested improvements.

read point-by-point responses

Referee: Abstract: The abstract reports empirical gains of 8.5 and 3 QWK points on ELLIPSE and LAILA but provides no details on experimental setup, baselines, statistical significance, error bars, or data splits. This absence makes the central performance claims impossible to verify from the given text.

Authors: We agree that the abstract is necessarily concise and omits granular experimental details. The full experimental setup, including data splits, baselines, evaluation protocol, and reporting of standard deviations across multiple runs, is provided in Sections 3 and 4. To improve verifiability from the abstract alone, we will revise it to briefly note the cross-prompt evaluation setting, the datasets used, and that results include standard deviations from repeated runs with statistical significance tests reported in the main text. revision: yes
Referee: Abstract / Methods: Prototypical networks are designed for classification tasks using class centroids and distance-based prediction. AES is a regression/ordinal task with prompt-specific score ranges and distributions (as noted for ASAP). The paper does not detail the adaptation (e.g., how distance to prototypes is mapped to numeric scores) or demonstrate that this preserves prompt-invariance under score-range shifts, which is load-bearing for the cross-prompt generalization claim.

Authors: This is a fair observation on the adaptation required for a regression task. While the Methods section outlines the meta-learning framework and use of prototypes derived from support-set representations, we acknowledge that the precise mapping from prototype distances/similarities to numeric scores and the handling of heterogeneous score ranges could be clarified further. In the revised version, we will expand the Methods section with an explicit description of the regression adaptation (including any learned projection or interpolation step) and add targeted analysis or supplementary experiments on the ASAP dataset to demonstrate that the learned representations remain effective under score-range shifts. revision: yes

Circularity Check

0 steps flagged

No circularity; empirical meta-learning results only.

full rationale

The paper introduces MAPLE as a meta-learning framework adapting prototypical networks for cross-prompt AES, with performance evaluated empirically on ELLIPSE, ASAP, and LAILA datasets. No derivation chain, equations, or first-principles predictions exist that could reduce to inputs by construction. Claims rest on standard empirical comparisons to baselines (QWK improvements reported), without fitted parameters renamed as predictions, self-definitional loops, or load-bearing self-citations. The central premise (transferable representations via meta-training) is tested via held-out prompt experiments and does not collapse to tautology.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

No free parameters, axioms, or invented entities are identifiable from the abstract alone. The approach implicitly assumes standard meta-learning transferability without additional postulates.

pith-pipeline@v0.9.0 · 5435 in / 1047 out tokens · 48464 ms · 2026-05-10T05:53:35.757144+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

88 extracted references · 38 canonical work pages · 1 internal anchor

[1]

Lee, Yejin and Jeong, Seokwon and Kim, Hongjin and Kim, Tae-il and Choi, Sung-Won and Kim, Harksoo , booktitle=
[2]

https://proceedings.mlr.press/v70/finn17a/finn17a.pdf

Model-agnostic meta-learning for fast adaptation of deep networks , author=. International conference on machine learning , pages=. 2017 , url="https://proceedings.mlr.press/v70/finn17a/finn17a.pdf", organization=

2017
[3]

arXiv preprint arXiv:2107.14035 , year=

Prototransformer: A meta-learning approach to providing student feedback , author=. arXiv preprint arXiv:2107.14035 , year=

work page arXiv
[4]

Proceedings of the 2010 conference on empirical methods in natural language processing , pages=

Modeling organization in student essays , author=. Proceedings of the 2010 conference on empirical methods in natural language processing , pages=

2010
[5]

Proceedings of the Fifteenth Workshop on Innovative Use of NLP for Building Educational Applications , pages=

Can neural networks automatically score essay traits? , author=. Proceedings of the Fifteenth Workshop on Innovative Use of NLP for Building Educational Applications , pages=
[6]

Transactions of the Association for Computational Linguistics , volume=

Towards evaluating narrative quality in student writing , author=. Transactions of the Association for Computational Linguistics , volume=. 2018 , publisher=

2018
[7]

Many Hands Make Light Work: Using Essay Traits to Automatically Score Essays

Kumar, Rahul and Mathias, Sandeep and Saha, Sriparna and Bhattacharyya, Pushpak. Many Hands Make Light Work: Using Essay Traits to Automatically Score Essays. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022. doi:10.18653/v1/2022.naacl-main.106

work page doi:10.18653/v1/2022.naacl-main.106 2022
[8]

Attention-based Recurrent Convolutional Neural Network for Automatic Essay Scoring

Dong, Fei and Zhang, Yue and Yang, Jie. Attention-based Recurrent Convolutional Neural Network for Automatic Essay Scoring. Proceedings of the 21st Conference on Computational Natural Language Learning ( C o NLL 2017). 2017. doi:10.18653/v1/K17-1017

work page doi:10.18653/v1/k17-1017 2017
[9]

Mathias, Sandeep and Bhattacharyya, Pushpak , booktitle=
[10]

, author=

Weighted kappa: nominal scale agreement provision for scaled disagreement or partial credit. , author=. Psychological bulletin , volume=. 1968 , publisher=

1968
[11]

and Laflair, Geoffrey and Verardi, Anthony and Burstein, Jill

Yancey, Kevin P. and Laflair, Geoffrey and Verardi, Anthony and Burstein, Jill. Rating Short L 2 Essays on the CEFR Scale with GPT -4. Proceedings of the 18th Workshop on Innovative Use of NLP for Building Educational Applications (BEA 2023). 2023. doi:10.18653/v1/2023.bea-1.49

work page doi:10.18653/v1/2023.bea-1.49 2023
[12]

Automated evaluation of written discourse coherence using GPT -4

Naismith, Ben and Mulcaire, Phoebe and Burstein, Jill. Automated evaluation of written discourse coherence using GPT -4. Proceedings of the 18th Workshop on Innovative Use of NLP for Building Educational Applications (BEA 2023). 2023. doi:10.18653/v1/2023.bea-1.32

work page doi:10.18653/v1/2023.bea-1.32 2023
[13]

arXiv preprint arXiv:2310.05191 , year=

Fabric: Automated scoring and feedback generation for essays , author=. arXiv preprint arXiv:2310.05191 , year=

work page arXiv
[14]

arXiv preprint arXiv:2401.06431 , year=

From Automation to Augmentation: Large Language Models Elevating Essay Scoring Landscape , author=. arXiv preprint arXiv:2401.06431 , year=

work page arXiv
[15]

grading essays by computer , author=

The imminence of... grading essays by computer , author=. The Phi Delta Kappan , volume=. 1966 , publisher=

1966
[16]

Proceedings of the 2015 conference on empirical methods in natural language processing , pages=

Flexible domain adaptation for automated essay scoring using correlated linear regression , author=. Proceedings of the 2015 conference on empirical methods in natural language processing , pages=

2015
[17]

Gaheen and Rania M

Marwa M. Gaheen and Rania M. ElEraky and Ahmed A. Ewees , journal=. Optimized Neural Network-Based Improved Multiverse Optimizer Algorithm For Automated. 2020 , volume=

2020
[18]

Automated students

Gaheen, Marwa M and ElEraky, Rania M and Ewees, Ahmed A , journal=. Automated students. 2021 , publisher=

2021
[19]

Automatic

Abdeljaber, Hikmat A , journal=. Automatic. 2021 , publisher=

2021
[20]

Automatic scoring of

Alsanie, Waleed and Alkanhal, Mohamed I and Alhamadi, Mohammed and Alqabbany, Abdulaziz O , journal=. Automatic scoring of. 2022 , publisher=

2022
[21]

Beyond essay length: evaluating e-rater

Chodorow, Martin and Burstein, Jill , journal=. Beyond essay length: evaluating e-rater. 2004 , publisher=

2004
[22]

Findings of the Association for Computational Linguistics: EMNLP 2020 , pages=

Enhancing automated essay scoring performance via fine-tuning pre-trained language models with combination of regression and ranking , author=. Findings of the Association for Computational Linguistics: EMNLP 2020 , pages=

2020
[23]

Proceedings of the 29th International Conference on Computational Linguistics , pages=

Automated Essay Scoring via Pairwise Contrastive Regression , author=. Proceedings of the 29th International Conference on Computational Linguistics , pages=
[24]

Precise zero-shot dense retrieval without relevance labels

Gao, Luyu and Ma, Xueguang and Lin, Jimmy and Callan, Jamie. Precise Z ero- S hot Dense Retrieval without Relevance Labels. Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 2023. doi:10.18653/v1/2023.acl-long.99

work page doi:10.18653/v1/2023.acl-long.99 2023
[25]

Large Language Models Are State-of-the-Art Evaluators of Translation Quality

Kocmi, Tom and Federmann, Christian. Large Language Models Are State-of-the-Art Evaluators of Translation Quality. Proceedings of the 24th Annual Conference of the European Association for Machine Translation. 2023

2023
[26]

Sentiment Analysis in the Era of Large Language Models: A Reality Check.CoRR abs/2305.15005, 2023

Sentiment Analysis in the Era of Large Language Models: A Reality Check , author=. arXiv preprint arXiv:2305.15005 , year=

work page arXiv
[27]

Lexical Chaining for Measuring Discourse Coherence Quality in Test-taker Essays

Somasundaran, Swapna and Burstein, Jill and Chodorow, Martin. Lexical Chaining for Measuring Discourse Coherence Quality in Test-taker Essays. Proceedings of COLING 2014, the 25th International Conference on Computational Linguistics: Technical Papers. 2014

2014
[28]

Modeling Prompt Adherence in Student Essays

Persing, Isaac and Ng, Vincent. Modeling Prompt Adherence in Student Essays. Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 2014. doi:10.3115/v1/P14-1144

work page doi:10.3115/v1/p14-1144 2014
[29]

Thank `` Goodness '' ! A Way to Measure Style in Student Essays

Mathias, Sandeep and Bhattacharyya, Pushpak. Thank `` Goodness '' ! A Way to Measure Style in Student Essays. Proceedings of the 5th Workshop on Natural Language Processing Techniques for Educational Applications. 2018. doi:10.18653/v1/W18-3705

work page doi:10.18653/v1/w18-3705 2018
[30]

Starling-7

Zhu, Banghua and Frick, Evan and Wu, Tianhao and Zhu, Hanlin and Jiao, Jiantao , month =. Starling-7
[31]

Touvron, Hugo and Martin, Louis and Stone, Kevin and Albert, Peter and Almahairi, Amjad and Babaei, Yasmine and Bashlykov, Nikolay and Batra, Soumya and Bhargava, Prajjwal and Bhosale, Shruti and others , journal=
[32]

Advances in Neural Information Processing Systems , volume=

Why do tree-based models still outperform deep learning on typical tabular data? , author=. Advances in Neural Information Processing Systems , volume=
[33]

Information Fusion , volume=

Tabular data: Deep learning is not all you need , author=. Information Fusion , volume=. 2022 , publisher=

2022
[34]

A Neural Approach to Automated Essay Scoring

Taghipour, Kaveh and Ng, Hwee Tou. A Neural Approach to Automated Essay Scoring. Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing. 2016. doi:10.18653/v1/D16-1193

work page doi:10.18653/v1/d16-1193 2016
[35]

https://arxiv.org/pdf/2008.01441

Prompt agnostic essay scorer: a domain generalization approach to cross-prompt automated essay scoring , author=. arXiv preprint arXiv:2008.01441 , url="https://arxiv.org/pdf/2008.01441", year=

work page arXiv 2008
[36]

Proceedings of the AAAI conference on artificial intelligence , volume=

Automated cross-prompt scoring of essay traits , author=. Proceedings of the AAAI conference on artificial intelligence , volume=
[37]

1967 , institution=

Automated readability index , author=. 1967 , institution=

1967
[38]

Achiam, Josh and Adler, Steven and Agarwal, Sandhini and Ahmad, Lama and Akkaya, Ilge and Aleman, Florencia Leoni and Almeida, Diogo and Altenschmidt, Janko and Altman, Sam and Anadkat, Shyamal and others , journal=
[39]

Sparks of Artificial General Intelligence: Early experiments with GPT-4

Bubeck, S. Sparks of artificial general intelligence: Early experiments with. arXiv preprint arXiv:2303.12712 , year=

work page internal anchor Pith review arXiv
[40]

Ke, Zixuan and Ng, Vincent , booktitle=
[41]

Proceedings of the ACM'82 Conference , pages=

Computer-based readability indexes , author=. Proceedings of the ACM'82 Conference , pages=
[42]

Automated Essay Scoring by Maximizing Human-Machine Agreement

Chen, Hongbo and He, Ben. Automated Essay Scoring by Maximizing Human-Machine Agreement. Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing. 2013

2013
[43]

TDNN : A Two-stage Deep Neural Network for Prompt-independent Automated Essay Scoring

Jin, Cancan and He, Ben and Hui, Kai and Sun, Le. TDNN : A Two-stage Deep Neural Network for Prompt-independent Automated Essay Scoring. Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 2018. doi:10.18653/v1/P18-1100

work page doi:10.18653/v1/p18-1100 2018
[44]

Towards Evaluating Narrative Quality In Student Writing

Somasundaran, Swapna and Flor, Michael and Chodorow, Martin and Molloy, Hillary and Gyawali, Binod and McCulla, Laura. Towards Evaluating Narrative Quality In Student Writing. Transactions of the Association for Computational Linguistics. 2018. doi:10.1162/tacl_a_00007

work page doi:10.1162/tacl_a_00007 2018
[45]

Poli, Michael and Wang, Jue and Massaroli, Stefano and Quesnelle, Jeffrey and Carlow, Ryan and Nguyen, Eric and Thomas, Armin , month = 12, year = 2023, url =

2023
[46]

Chawla, Nitesh V and Bowyer, Kevin W and Hall, Lawrence O and Kegelmeyer, W Philip , journal=
[47]

thought” of LLM by finding the “circuit

Gemma Team , year=. Gemma , url=. doi:10.34740/KAGGLE/M/3301 , publisher=

work page doi:10.34740/kaggle/m/3301
[48]

Automated Essay Scoring: A Reflection on the State of the Art

Li, Shengjie and Ng, Vincent. Automated Essay Scoring: A Reflection on the State of the Art. Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing. 2024

2024
[49]

Can Large Language Models Automatically Score Proficiency of Written Essays?

Mansour, Watheq Ahmad and Albatarni, Salam and Eltanbouly, Sohaila and Elsayed, Tamer. Can Large Language Models Automatically Score Proficiency of Written Essays?. Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024). 2024

2024
[50]

Crossley, Scott and Tian, Yu and Baffour, Perpetual and Franklin, Alex and Kim, Youngmeen and Morris, Wesley and Benner, Meg and Picou, Aigner and Boser, Ulrich , journal=. The. 2023 , publisher=

2023
[51]

Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing , pages=

Meta-Learning for Low-Resource Neural Machine Translation , author=. Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing , pages=

2018
[52]

Prompt- and Trait Relation-aware Cross-prompt Essay Trait Scoring

Do, Heejin and Kim, Yunsu and Lee, Gary Geunbae. Prompt- and Trait Relation-aware Cross-prompt Essay Trait Scoring. Findings of the Association for Computational Linguistics: ACL 2023. 2023. doi:10.18653/v1/2023.findings-acl.98

work page doi:10.18653/v1/2023.findings-acl.98 2023
[53]

PLAES : Prompt-generalized and Level-aware Learning Framework for Cross-prompt Automated Essay Scoring

Chen, Yuan and Li, Xia. PLAES : Prompt-generalized and Level-aware Learning Framework for Cross-prompt Automated Essay Scoring. Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024). 2024

2024
[54]

Conundrums in Cross-Prompt Automated Essay Scoring: Making Sense of the State of the Art

Li, Shengjie and Ng, Vincent. Conundrums in Cross-Prompt Automated Essay Scoring: Making Sense of the State of the Art. Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 2024. doi:10.18653/v1/2024.acl-long.414

work page doi:10.18653/v1/2024.acl-long.414 2024
[55]

Biometrics , year=

The Measurement of Observer Agreement for Categorical Data , author=. Biometrics , year=
[56]

PMAES : Prompt-mapping Contrastive Learning for Cross-prompt Automated Essay Scoring

Chen, Yuan and Li, Xia. PMAES : Prompt-mapping Contrastive Learning for Cross-prompt Automated Essay Scoring. Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 2023. doi:10.18653/v1/2023.acl-long.83

work page doi:10.18653/v1/2023.acl-long.83 2023
[57]

arXiv preprint arXiv:2403.08332 , year=

Autoregressive Score Generation for Multi-trait Essay Scoring , author=. arXiv preprint arXiv:2403.08332 , year=

work page arXiv
[58]

Autoregressive Score Generation for Multi-trait Essay Scoring

Do, Heejin and Kim, Yunsu and Lee, Gary. Autoregressive Score Generation for Multi-trait Essay Scoring. Findings of the Association for Computational Linguistics: EACL 2024. 2024

2024
[59]

Exploring LLM Prompting Strategies for Joint Essay Scoring and Feedback Generation

Stahl, Maja and Biermann, Leon and Nehring, Andreas and Wachsmuth, Henning. Exploring LLM Prompting Strategies for Joint Essay Scoring and Feedback Generation. Proceedings of the 19th Workshop on Innovative Use of NLP for Building Educational Applications (BEA 2024). 2024

2024
[60]

Unleashing Large Language Models ' Proficiency in Z ero-shot Essay Scoring

Lee, Sanwoo and Cai, Yida and Meng, Desong and Wang, Ziyang and Wu, Yunfang. Unleashing Large Language Models ' Proficiency in Z ero-shot Essay Scoring. Findings of the Association for Computational Linguistics: EMNLP 2024. 2024. doi:10.18653/v1/2024.findings-emnlp.10

work page doi:10.18653/v1/2024.findings-emnlp.10 2024
[61]

Knowledge-Based Systems , volume =

Xia Li and Minping Chen and Jian-Yun Nie , keywords =. Knowledge-Based Systems , volume =. 2020 , issn =. doi:https://doi.org/10.1016/j.knosys.2020.106491 , url =

work page doi:10.1016/j.knosys.2020.106491 2020
[62]

Meng, Yu and Xia, Mengzhou and Chen, Danqi , journal=
[63]

Linguistic features and proficiency classification in L 2 S panish and L 2 P ortuguese

del R \' o, Iria. Linguistic features and proficiency classification in L 2 S panish and L 2 P ortuguese. Proceedings of the 8th Workshop on NLP for Computer Assisted Language Learning. 2019

2019
[64]

Advances in neural information processing systems , volume=

Prototypical networks for few-shot learning , author=. Advances in neural information processing systems , volume=
[65]

2013 , publisher=

Handbook of automated essay evaluation: Current applications and new directions , author=. 2013 , publisher=

2013
[66]

Autoregressive Multi-trait Essay Scoring via Reinforcement Learning with Scoring-aware Multiple Rewards

Do, Heejin and Ryu, Sangwon and Lee, Gary. Autoregressive Multi-trait Essay Scoring via Reinforcement Learning with Scoring-aware Multiple Rewards. Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing. 2024. doi:10.18653/v1/2024.emnlp-main.917

work page doi:10.18653/v1/2024.emnlp-main.917 2024
[67]

arXiv preprint arXiv:1903.03096 , year=

Meta-dataset: A dataset of datasets for learning to learn from few examples , author=. arXiv preprint arXiv:1903.03096 , year=

work page arXiv 1903
[68]

Multilingual and cross-lingual document classification: A meta-learning approach

van der Heijden, Niels and Yannakoudakis, Helen and Mishra, Pushkar and Shutova, Ekaterina. Multilingual and cross-lingual document classification: A meta-learning approach. Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume. 2021. doi:10.18653/v1/2021.eacl-main.168

work page doi:10.18653/v1/2021.eacl-main.168 2021
[69]

International Conference on Artificial Intelligence in Education , pages=

Generalizable automatic short answer scoring via prototypical neural network , author=. International Conference on Artificial Intelligence in Education , pages=. 2023 , organization=

2023
[70]

Meta-Learning for Effective Multi-task and Multilingual Modelling

Tarunesh, Ishan and Khyalia, Sushil and Kumar, Vishwajeet and Ramakrishnan, Ganesh and Jyothi, Preethi. Meta-Learning for Effective Multi-task and Multilingual Modelling. Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume. 2021. doi:10.18653/v1/2021.eacl-main.314

work page doi:10.18653/v1/2021.eacl-main.314 2021
[71]

Advances in Neural Information Processing Systems , volume=

Meta-learning from tasks with heterogeneous attribute spaces , author=. Advances in Neural Information Processing Systems , volume=
[72]

2020 , isbn =

Cao, Yue and Jin, Hanqi and Wan, Xiaojun and Yu, Zhiwei , title =. 2020 , isbn =. doi:10.1145/3397271.3401037 , booktitle =

work page doi:10.1145/3397271.3401037 2020
[73]

Neurocomputing , volume =

Jiangsong Xu and Jian Liu and Mingwei Lin and Jiayin Lin and Shenbao Yu and Liang Zhao and Jun Shen , keywords =. Neurocomputing , volume =. 2025 , issn =. doi:https://doi.org/10.1016/j.neucom.2024.129283 , url =

work page doi:10.1016/j.neucom.2024.129283 2025
[74]

Pairwise dual-level alignment for cross-prompt automated essay scoring , journal =

Chunyun Zhang and Jiqin Deng and Xiaolin Dong and Hongyan Zhao and Kailin Liu and Chaoran Cui , keywords =. Pairwise dual-level alignment for cross-prompt automated essay scoring , journal =. 2025 , issn =. doi:https://doi.org/10.1016/j.eswa.2024.125924 , url =

work page doi:10.1016/j.eswa.2024.125924 2025
[75]

Experiments with Universal CEFR Classification

Vajjala, Sowmya and Rama, Taraka. Experiments with Universal CEFR Classification. Proceedings of the Thirteenth Workshop on Innovative Use of NLP for Building Educational Applications. 2018. doi:10.18653/v1/W18-0515

work page doi:10.18653/v1/w18-0515 2018
[76]

Mixture of Ordered Scoring Experts for Cross-prompt Essay Trait Scoring

Chen, Po-Kai and Tsai, Bo-Wei and Wei, Shao Kuan and Wang, Chien-Yao and Wang, Jia-Ching and Huang, Yi-Ting. Mixture of Ordered Scoring Experts for Cross-prompt Essay Trait Scoring. Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 2025. doi:10.18653/v1/2025.acl-long.884

work page doi:10.18653/v1/2025.acl-long.884 2025
[77]

Automated essay scoring in

Ghazawi, Rayed and Simpson, Edwin , journal=. Automated essay scoring in
[78]

Automatic Scoring of

Mahmoud, Somaia and Nabil, Emad and Torki, Marwan , journal=. Automatic Scoring of. 2024 , publisher=

2024
[79]

How well can

Ghazawi, Rayed and Simpson, Edwin , journal=. How well can
[80]

Communications of the IBIMA , volume =

Rim Aroua Machhout and Chiraz Ben Othmane Zribi , title =. Communications of the IBIMA , volume =. 2024 , article-id =. doi:10.5171/2024.176992 , url =

work page doi:10.5171/2024.176992 2024

Showing first 80 references.