arxiv: 2605.02402 · v1 · submitted 2026-05-04 · 💻 cs.CL · cs.AI

Recognition: 2 theorem links

· Lean Theorem

Automatic Reflection Level Classification in Hungarian Student Essays

Zsolt Csibi , M\'onika S\'andor , M\'onika Serf\H{o}z\H{o} , Kinga Gy\"ongy , Kristian Fenech

Authors on Pith no claims yet

Pith reviewed 2026-05-08 18:30 UTC · model grok-4.3

classification 💻 cs.CL cs.AI

keywords Hungarian languagereflection level classificationmachine learningtransformer modelsclass imbalancestudent essaysautomated assessmenteducational NLP

0 comments

The pith

Classical machine learning models classify reflection levels in Hungarian student essays at 71 percent average performance, slightly ahead of transformers at 68 percent.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes that machine learning can automate assessment of reflective writing in Hungarian, a language with little prior work on the task. The authors assembled nearly two thousand expert-labeled student essays on a four-level reflection scale and tested both classical models using TF-IDF and semantic features against fine-tuned Hungarian transformers. Multiple strategies for addressing class imbalance were compared through ablation experiments. Shallow models delivered the higher overall score while transformers handled the rarer reflection categories more reliably. This matters because reflective thinking is a valued educational skill yet manual evaluation remains slow and subjective, limiting how widely it can be practiced at scale.

Core claim

In the first comprehensive study of automatic reflection level classification for Hungarian, a dataset of 1,954 expert-annotated student essays on a four-level scale is used to evaluate classical machine learning pipelines and fine-tuned transformers. With appropriate feature engineering and imbalance handling, the shallow models reach up to 71% overall score averaged over accuracy, F1-score, and ROC AUC, outperforming the transformer approach at 68% overall while the transformers demonstrate better generalization on minority classes.

What carries the argument

The four-level expert-annotated reflection scale on Hungarian student essays, used to compare classical feature-based classifiers against transformer-based document classifiers under multiple class-imbalance correction methods.

If this is right

Classical models with targeted feature engineering stay competitive for text classification in morphologically rich low-resource languages.
Transformer models offer an advantage when accurate identification of minority reflection levels is the priority.
Class weighting, oversampling, augmentation, and adjusted loss functions each improve robustness on imbalanced educational text.
The released Hungarian dataset supplies a reproducible base for extending automated reflective analysis to related tasks.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The results suggest that in many non-English educational settings, simpler classical pipelines may deliver adequate performance with far less compute than transformer fine-tuning.
The same modeling approach could transfer to reflection assessment in other languages that share Hungarian's morphological complexity once comparable labeled collections exist.
Embedding the classifiers in writing platforms could enable immediate feedback that helps students improve reflective skills without added teacher workload.
Hybrid systems that route examples to either classical or transformer components based on class frequency might combine the observed strengths of both.

Load-bearing premise

The four-level reflection labels assigned by education experts are consistent and accurately represent students' reflective thinking.

What would settle it

Independent re-annotation of the same essays by a separate group of experts that produces substantially different level assignments or sharply lower model performance on the new labels would indicate the original ground truth is unreliable.

Figures

Figures reproduced from arXiv: 2605.02402 by Kinga Gy\"ongy, Kristian Fenech, M\'onika S\'andor, M\'onika Serf\H{o}z\H{o}, Zsolt Csibi.

**Figure 1.** Figure 1: Distribution of reflection levels in the dataset. From 0 (no reflec view at source ↗

**Figure 2.** Figure 2: Token count distribution generated by the HuBERT tokenizer. view at source ↗

**Figure 3.** Figure 3: Confusion matrix (a) and ROC curves (b) of the best Shallow view at source ↗

**Figure 4.** Figure 4: Confusion matrix (a) and ROC curves (b) of an example where view at source ↗

**Figure 5.** Figure 5: (a) ROC Curve of the Hubert model with backtranslation and focal view at source ↗

read the original abstract

Reflective thinking is a key competency in education, but assessing reflective writing remains a time-consuming and subjective task for education experts. While automated reflective analysis has been explored in several languages, Hungarian language was not researched extensively. In this paper, we present the first comprehensive study on automatic reflection level classification in Hungarian student essays. We used a large, expert-annotated Hungarian dataset consisting of 1,954 reflective essays collected over multiple academic years and labeled on a four-level reflection scale. We investigate two approaches: (1) classical machine learning models using TF-IDF and semantic embedding features, and (2) Hungarian-specific transformer models fine-tuned for document-level reflection classification. To address the strong class imbalance in the dataset, we systematically examine class weighting, oversampling, data augmentation, and alternative loss functions. An extensive ablation study is conducted to analyze the contribution of each modeling and balancing strategy. Our results show that shallow machine learning models with appropriate feature engineering achieve strong overall performance, reaching up to 71% overall score averaged over accuracy, F1-score, and ROC AUC metrics, while transformer-based models achieve slightly lower overall score (68%) averaged over the same metrics, but demonstrate better generalization on minority reflection classes. These findings highlight the continued relevance of classical methods for low-resource settings and the robustness of transformer models for imbalanced classification. The proposed dataset and experimental insights provide a solid foundation for future research on automated reflective analysis in Hungarian and other morphologically rich languages.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

This paper releases the first Hungarian dataset for four-level reflection classification in student essays and runs systematic baselines showing classical ML at 71% averaged score versus 68% for transformers, with the latter stronger on minority classes.

read the letter

The main thing to know is that this is the first dataset and full experimental comparison for automatic reflection level classification in Hungarian essays. They gathered 1,954 expert-labeled pieces across multiple years and tested both classical models with TF-IDF plus embeddings and Hungarian-specific transformers, while trying several imbalance fixes and running ablations on each piece.

Referee Report

3 major / 3 minor

Summary. The manuscript presents the first study on automatic classification of reflection levels in Hungarian student essays. Using a dataset of 1,954 expert-annotated essays labeled on a four-level scale, it compares classical ML models with TF-IDF and embedding features against fine-tuned Hungarian transformer models. Various strategies for handling class imbalance are explored through ablations, with results indicating classical models achieve an averaged score of 71% (across accuracy, F1-score, and ROC AUC) compared to 68% for transformers, though transformers perform better on minority classes.

Significance. This work is significant for introducing automated analysis to Hungarian reflective writing, a low-resource language setting. The systematic examination of balancing techniques and ablations provides valuable insights for imbalanced classification tasks. If the ground truth is reliable, it supports the relevance of classical methods in such scenarios and offers a foundation for future work in morphologically rich languages.

major comments (3)

[§3 (Dataset and Annotation)] §3 (Dataset and Annotation): No inter-annotator agreement (IAA) metrics, such as Cohen's kappa or Krippendorff's alpha, are reported for the expert annotations on the four-level reflection scale. Since reflection level assessment is inherently subjective, the absence of IAA leaves the reliability of the ground truth labels unverified, which is load-bearing for all performance claims in the results sections.
[§4 (Experimental Setup)] §4 (Experimental Setup): The evaluation protocol lacks details on the train/validation/test splits, whether stratified sampling was used given the imbalance, and any statistical significance tests (e.g., McNemar's test or paired t-tests) for the reported differences between model performances (71% vs 68%). This makes it difficult to assess the robustness of the headline comparison.
[§5 (Results)] §5 (Results): The 'overall score' is defined as the average of accuracy, F1-score, and ROC AUC, but it is unclear if these are macro-averaged or weighted, and how ROC AUC is computed for multi-class (one-vs-rest?). This affects interpretation of the 71% and 68% figures.

minor comments (3)

[Abstract] Abstract: The abstract mentions 'up to 71%' but the full results should clarify if this is the best single model or an average across configurations.
[Tables] Tables: Ensure all tables reporting metrics include standard deviations if multiple runs were performed, and specify the exact number of runs.
[References] References: Consider adding references to prior work on reflection classification in other languages for better context.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for their constructive and detailed review of our manuscript. We have addressed each major comment point by point below and revised the paper accordingly to improve clarity and transparency.

read point-by-point responses

Referee: §3 (Dataset and Annotation): No inter-annotator agreement (IAA) metrics, such as Cohen's kappa or Krippendorff's alpha, are reported for the expert annotations on the four-level reflection scale. Since reflection level assessment is inherently subjective, the absence of IAA leaves the reliability of the ground truth labels unverified, which is load-bearing for all performance claims in the results sections.

Authors: We agree that IAA reporting is essential for subjective annotation tasks. The full dataset was annotated by a single expert in educational psychology, as multiple annotators with the required domain expertise were not available within our resource constraints. We have revised §3 to describe the annotation protocol, guideline development via pilot studies, and to explicitly state this single-annotator limitation along with its implications for ground-truth reliability. revision: yes
Referee: §4 (Experimental Setup): The evaluation protocol lacks details on the train/validation/test splits, whether stratified sampling was used given the imbalance, and any statistical significance tests (e.g., McNemar's test or paired t-tests) for the reported differences between model performances (71% vs 68%). This makes it difficult to assess the robustness of the headline comparison.

Authors: We have expanded §4 in the revised manuscript to specify the 80/10/10 train/validation/test split ratios and confirm that stratified sampling was applied based on the four reflection levels to preserve class distributions. We have also added McNemar's test results comparing the best classical and transformer models to assess the statistical significance of the performance differences. revision: yes
Referee: §5 (Results): The 'overall score' is defined as the average of accuracy, F1-score, and ROC AUC, but it is unclear if these are macro-averaged or weighted, and how ROC AUC is computed for multi-class (one-vs-rest?). This affects interpretation of the 71% and 68% figures.

Authors: We have clarified the metric computation in the revised §5: accuracy is the standard multi-class accuracy; F1-score is macro-averaged; and ROC AUC uses the one-vs-rest approach with macro-averaging. The overall score is the unweighted arithmetic mean of these three values. A supplementary table with the individual metric breakdowns for all models has been added for full transparency. revision: yes

Circularity Check

0 steps flagged

No circularity: purely empirical ML evaluation on held-out data

full rationale

The paper describes an empirical pipeline: collection of 1,954 Hungarian essays, expert annotation on a four-level scale, extraction of TF-IDF and embedding features, training of classical ML and transformer models, handling of class imbalance via weighting/oversampling/augmentation, and reporting of accuracy/F1/ROC-AUC on (presumably held-out) test data. No equations, first-principles derivations, or predictions appear; results are measured against external labels rather than being forced by construction from fitted inputs or self-citations. The central claims (71% vs 68% averaged scores, transformers better on minorities) are therefore falsifiable against the fixed annotations and do not reduce to any of the enumerated circularity patterns.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

The claims rest on the validity of the annotation process and the assumption that the collected essays represent typical student reflective writing.

free parameters (1)

model hyperparameters and balancing parameters
Various parameters for class weighting, oversampling, and loss functions are tuned but not specified as fixed values in the abstract.

axioms (1)

domain assumption Expert annotations on the reflection scale are accurate and consistent.
The classification performance is measured against these labels, so their quality is foundational.

pith-pipeline@v0.9.0 · 5586 in / 1332 out tokens · 67357 ms · 2026-05-08T18:30:16.415811+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

57 extracted references · 42 canonical work pages · 5 internal anchors

[1]

doi:10.21203/rs.3.rs-5408888/v1

Machine learning to classify the depth of reflection in stem student writings. doi:10.21203/rs.3.rs-5408888/v1. Apache Arrow,

work page doi:10.21203/rs.3.rs-5408888/v1
[2]

arXiv preprint arXiv:2310.18323 doi:10.48550 /arXiv.2310.18323

Overview of adaboost: Reconciling its views to better understand its dynamics. arXiv preprint arXiv:2310.18323 doi:10.48550 /arXiv.2310.18323. Beltagy, I., Peters, M.E., Cohan, A.,

work page arXiv
[3]

Longformer: The Long-Document Transformer

Longformer: The long-document transformer. doi:10.48550/arXiv.2004.05150,arXiv:2004.05150. Canny, S.,

work page internal anchor Pith review doi:10.48550/arxiv.2004.05150 2004
[4]

Accessed: 2026-01-16

python-docx python library.https://pypi.org/project /python-docx/. Accessed: 2026-01-16. Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.,

2026
[5]

Chawla, Kevin W

Smote: Synthetic minority over-sampling technique. Journal of Artificial Intelli- gence Research 16, 321–357. doi:10.1613/jair.953. 24 Chawla, N.V., Lazarevic, A., Hall, L.O., Bowyer, K.W.,

work page doi:10.1613/jair.953
[6]

Early Years 38, 316–332

Early childhood teachers’ thinking and reflection: A model of current practice in New Zealand. Early Years 38, 316–332. doi:10.1080/09575146.2016.1259211. Chong, C., Sheikh, U.U., Samah, N., Sha’ameri, A.,

work page doi:10.1080/09575146.2016.1259211 2016
[7]

IOP Conference Series: Materials Science and Engineering 884, 012069

Analysis on re- flective writing using natural language processing and sentiment analysis. IOP Conference Series: Materials Science and Engineering 884, 012069. doi:10.1088/1757-899X/884/1/012069. Clark, K., Luong, M.T., Le, Q.V., Manning, C.D.,

work page doi:10.1088/1757-899x/884/1/012069
[8]

arXiv:2003.10555

Electra: Pre- training text encoders as discriminators rather than generators. doi:10.4 8550/arXiv.2003.10555,arXiv:2003.10555. Conneau, A., Khandelwal, K., Goyal, N., Chaudhary, V., Wenzek, G., Guzm´ an, F., Grave, E., Ott, M., Zettlemoyer, L., Stoyanov, V.,

work page arXiv 2003
[9]

Guanting Dong, Hongyi Yuan, Keming Lu, Chengpeng Li, Mingfeng Xue, Dayiheng Liu, Wei Wang, Zheng Yuan, Chang Zhou, and Jingren Zhou

Unsupervised cross-lingual representation learning at scale. doi:10.48550 /arXiv.1911.02116,arXiv:1911.02116. Dyment, J.E., O’Connell, T.S.,

work page arXiv 1911
[10]

Teaching in Higher Education 16, 81–97

Assessing the quality of reflection in student journals: a review of the research. Teaching in Higher Education 16, 81–97. doi:10.1080/13562517.2010.507308. Fenniak, M.,

work page doi:10.1080/13562517.2010.507308 2010
[11]

Accessed: 2026-01-16

Pypdf2 python library.https://pypi.org/project/P yPDF2/. Accessed: 2026-01-16. Ferreira-Mello, R., Andr´ e, M., Pinheiro, A., Costa, E., Romero, C.,

2026
[12]

Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 9, e1332

Text mining in education. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 9, e1332. doi:10.1002/widm.1332. Grimalt-´Alvaro, C., Usart, M.,

work page doi:10.1002/widm.1332
[13]

Journal of computing in higher education 36, 647–682

Sentiment analysis for formative as- sessment in higher education: a systematic literature review. Journal of computing in higher education 36, 647–682. doi:10.1007/s12528-023-0 9370-5. Grootendorst, M.,

work page doi:10.1007/s12528-023-0
[14]

BERTopic: Neural topic modeling with a class-based TF-IDF procedure

Bertopic: Neural topic modeling with a class-based tf-idf procedure. doi:10.48550/arXiv.2203.05794,arXiv:2203.05794. Gy¨ ongy, K.,

work page internal anchor Pith review doi:10.48550/arxiv.2203.05794
[15]

1322–1328

Adasyn: Adaptive synthetic sam- pling approach for imbalanced learning, in: 2008 IEEE International Joint Conference on Neural Networks (IEEE World Congress on Computational Intelligence), pp. 1322–1328. doi:10.1109/IJCNN.2008.4633969. Hugging Face team,

work page doi:10.1109/ijcnn.2008.4633969 2008
[16]

Accessed: 2026-01-16

Hugging face transformers library.https://hu ggingface.co/docs/transformers/index. Accessed: 2026-01-16. HuggingFace Inc.,

2026
[17]

Accessed: 2026-01-16

Hugging face datasets library.https://huggingf ace.co/docs/datasets/index. Accessed: 2026-01-16. Jaiswal, A., Milios, E.,

2026
[18]

arXiv preprint arXiv:2310.20558 doi:10.48550/arXiv.2310.20558

Breaking the token barrier: chunking and convolution for efficient long text classification with BERT. arXiv preprint arXiv:2310.20558 doi:10.48550/arXiv.2310.20558. Japkowicz, N., Stephen, S.,

work page doi:10.48550/arxiv.2310.20558
[19]

Intelligent Data Analysis 6, 429–449

The class imbalance problem: A system- atic study. Intelligent Data Analysis 6, 429–449. doi:10.3233/IDA-200 2-6504. K´ apl´ ar-Kod´ acsy, K., Dorner, H.,

work page doi:10.3233/ida-200
[20]

International Journal of Mentoring and Coaching in Education 9, 257–277

The use of audio diaries to support reflective mentoring practice in hungarian teacher training. International Journal of Mentoring and Coaching in Education 9, 257–277. doi:10.110 8/IJMCE-05-2019-0061. Korthagen, F., Nuijten, E.,

2019
[21]

doi:10.4324/9781003221470

The Power of Reflection in Teacher Ed- ucation and Professional Development: Strategies for In-Depth Teacher Learning. doi:10.4324/9781003221470. Korthagen, F., Vasalos, A.,

work page doi:10.4324/9781003221470
[22]

Teachers and Teaching 11, 47–71

Levels in reflection: Core reflection as a means to enhance professional growth. Teachers and Teaching 11, 47–71. doi:10.1080/1354060042000337093. Kov´ acs, G.,

work page doi:10.1080/1354060042000337093
[23]

Accessed: 2026-01-16

Smote variants python library.https://github.com/a nalyticalmindsltd/smote_variants. Accessed: 2026-01-16. Lee, H.J.,

2026
[24]

Nicolás, The bar derived category of a curved dg algebra, Journal of Pure and Applied Algebra 212 (2008) 2633–2659

Understanding and assessing preservice teachers’ reflective thinking. Teaching and Teacher Education 21, 699–715. doi:10.1016/j. tate.2005.05.007. 26 Lim, J.Y., Ong, S.Y.K., Ng, C.Y.H., Chan, K.L.E., Wu, S.Y.E.A., So, W.Z., Tey, G.J.C., Lam, Y.X., Gao, N.L.X., Lim, Y.X., et al.,

work page doi:10.1016/j 2005
[25]

Liu, X.Y., Wu, J., Zhou, Z.H.,

doi:10.1186/s12909-022-03924-4. Liu, X.Y., Wu, J., Zhou, Z.H.,

work page doi:10.1186/s12909-022-03924-4
[26]

IEEE Transactions on Systems, Man, and Cybernet- ics, Part B (Cybernetics) 39, 539–550

Exploratory undersampling for class- imbalance learning. IEEE Transactions on Systems, Man, and Cybernet- ics, Part B (Cybernetics) 39, 539–550. doi:10.1109/TSMCB.2008.20078

work page doi:10.1109/tsmcb.2008.20078 2008
[27]

RoBERTa: A Robustly Optimized BERT Pretraining Approach

Roberta: A robustly opti- mized bert pretraining approach. doi:10.48550/arXiv.1907.11692, arXiv:1907.11692. Liu, Z.,

work page internal anchor Pith review doi:10.48550/arxiv.1907.11692 1907
[28]

Accessed: 2026-01-16

Imbalanced ensemble python library.https://github.com /ZhiningLiu1998/imbalanced-ensemble. Accessed: 2026-01-16. Nehyba, J., ˇStef´ anik, M.,

2026
[29]

Nemeskey, D.M.,

doi:10.1007/s10639-022-11254-7. Nemeskey, D.M.,

work page doi:10.1007/s10639-022-11254-7
[30]

Accessed: 2026-01-16

Nltk python library.https://pypi.org/project/nlt k/. Accessed: 2026-01-16. Occhiuto, K., Tarshis, S., Todd, S., Gheorghe, R.,

2026
[31]

The British Journal of Social Work 54, 2642–2660

Reflecting on reflection in clinical social work: Unsettling a key social work strategy. The British Journal of Social Work 54, 2642–2660. doi:10.1093/bjsw/b cae052. OECD,

work page doi:10.1093/bjsw/b
[32]

URL:https://www

Oecd learning compass 2030 - glossary. URL:https://www. oecd.org/content/dam/oecd/en/about/projects/edu/education-2 040/publications/OECD%20Learning%20Compass%202030%20-%20Glos sary.pdf. 27 Official Journal of the European Union,

2030
[33]

URL: https://eur-lex.europa.eu/legal-content/EN/TXT/?uri=CELEX: 32018H0604(01)

Council recommendation of 22 may 2018 on key competences for lifelong learning (2018/c 189/01). URL: https://eur-lex.europa.eu/legal-content/EN/TXT/?uri=CELEX: 32018H0604(01). Pennebaker, J., Booth, R., Boyd, R., Francis, M.,

2018
[34]

Deep contextualized word representations

Deep contextualized word representations. doi:10 .48550/arXiv.1802.05365,arXiv:1802.05365. Pilicita-Garrido, A., Barra, E.,

work page Pith review arXiv
[35]

International Journal of Inter- active Multimedia & Artificial Intelligence 9, 177–188

Sentiment analysis with transformers applied to education: Systematic review. International Journal of Inter- active Multimedia & Artificial Intelligence 9, 177–188. doi:10.9781/ijim ai.2025.02.008. PyTorch team,

work page doi:10.9781/ijim 2025
[36]

Accessed: 2026-01-16

Captum python library.https://pypi.org/project /captum/. Accessed: 2026-01-16. Reimers, N., Gurevych, I.,

2026
[37]

Sentence-BERT: Sentence embeddings using Siamese BERT-networks, in: Inui, K., Jiang, J., Ng, V., Wan, X. (Eds.), Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Con- ference on Natural Language Processing (EMNLP-IJCNLP), Associa- tion for Computational Linguistics, Hong Kong, China. pp...

work page internal anchor Pith review doi:10.48550/arxiv.1908.10084 2019
[38]

Reflective Practice 20, 761–776

Validation of a reflection rubric for higher education. Reflective Practice 20, 761–776. doi:10.1080/14623943.2019.1676712. Schmid, H., Laws, F.,

work page doi:10.1080/14623943.2019.1676712 2019
[39]

(Eds.), Proceedings of the 22nd International Conference on Computational Linguistics (Coling 2008), Coling 2008 Organizing Com- mittee, Manchester, UK

Estimation of conditional probabilities with de- cision trees and an application to fine-grained POS tagging, in: Scott, D., Uszkoreit, H. (Eds.), Proceedings of the 22nd International Conference on Computational Linguistics (Coling 2008), Coling 2008 Organizing Com- mittee, Manchester, UK. pp. 777–784. doi:10.3115/1599081.1599179. Schulman, A., Barbosa, S.,

work page doi:10.3115/1599081.1599179 2008
[40]

1226–1229

Text genre classification using only parts of speech, in: 2018 International Conference on Computational Science and Computational Intelligence (CSCI), pp. 1226–1229. doi:10.1109/CS CI46756.2018.00236. 28 Singer-Vine, J.,

work page doi:10.1109/cs 2018
[41]

Accessed: 2026-01-16

Pdfplumber python library.https://pypi.org/pro ject/pdfplumber/. Accessed: 2026-01-16. Solopova, V., Rostom, E., Cremer, F., Gruszczynski, A., Witte, S., Zhang, C., L´ opez, F.R., Pl¨ oßl, L., Hofmann, F., Romeike, R., Gl¨ aser-Zikuda, M., Benzm¨ uller, C., Landgraf, T.,

2026
[42]

(Eds.), KI 2023: Advances in Artificial Intelligence, Springer Nature Switzerland, Cham

Papagai: Automated feedback for reflective essays, in: Seipel, D., Steen, A. (Eds.), KI 2023: Advances in Artificial Intelligence, Springer Nature Switzerland, Cham. pp. 198–206. Sumsion, J.,

2023
[43]

Reflective Practice 1, 199–214

Facilitating reflection: A cautionary account. Reflective Practice 1, 199–214. doi:10.1080/14623943.2000.11661687. Sun, Y., Kamel, M.S., Wong, A.K., Wang, Y.,

work page doi:10.1080/14623943.2000.11661687 2000
[44]

Pattern Recognition 40, 3358–3378

Cost-sensitive boosting for classification of imbalanced data. Pattern Recognition 40, 3358–3378. doi:10.1016/j.patcog.2007.04.009. SzegedAI, MILAB,

work page doi:10.1016/j.patcog.2007.04.009 2007
[45]

Accessed: 2026-01-16

huSpacy python library.https://huspacy.gith ub.io/. Accessed: 2026-01-16. Tang, X., Cao, J.,

2026
[46]

Procedia-Social and Behavioral Sciences 198, 474–478

Automatic genre classification via n-grams of part- of-speech tags. Procedia-Social and Behavioral Sciences 198, 474–478. doi:10.1016/j.sbspro.2015.07.468. Tashiro, J., Shimpuku, Y., Naruse, K., Maftuhah, Matsutani, M.,

work page doi:10.1016/j.sbspro.2015.07.468 2015
[47]

Japan Journal of Nursing Science 10, 170–179

Concept analysis of reflection in nursing professional development. Japan Journal of Nursing Science 10, 170–179. doi:10.1111/j.1742-7924.20 12.00222.x. Tiedemann, J., Aulamo, M., Bakshandaeva, D., Boggia, M., Gr¨ onroos, S.A., Nieminen, T., Raganato, A., Scherrer, Y., V´ azquez, R., Virpioja, S.,

work page doi:10.1111/j.1742-7924.20
[48]

Language Re- sources and Evaluation 58, 713–755

Democratizing neural machine translation with opus-mt. Language Re- sources and Evaluation 58, 713–755. doi:10.1007/s10579-023-09704-w. Tuning Academy,

work page doi:10.1007/s10579-023-09704-w
[49]

International Journal of Artificial Intelli- gence in Education 29, 217–257

Automated analysis of reflection in writing: Validating machine learning approaches. International Journal of Artificial Intelli- gence in Education 29, 217–257. doi:10.1007/s40593-019-00174-2. Wald, H.S., Reis, S.P.,

work page doi:10.1007/s40593-019-00174-2
[50]

Journal of General Internal Medicine 25, 746–749

Beyond the margins: reflective writing and de- velopment of reflective capacity in medical education. Journal of General Internal Medicine 25, 746–749. doi:10.1007/s11606-010-1347-4. 29 Wald, H.S., White, J., Reis, S.P., Esquibel, A.Y., Anthony, D.,

work page doi:10.1007/s11606-010-1347-4
[51]

Medical Teacher 41, 152–160

Grap- pling with complexity: medical students’ reflective writings about chal- lenging patient encounters as a window into professional identity forma- tion. Medical Teacher 41, 152–160. doi:10.1080/0142159X.2018.147572

work page doi:10.1080/0142159x.2018.147572 2018
[52]

Diversity analysis on imbalanced data sets by using ensemble models, in: 2009 IEEE symposium on computational intelligence and data mining, IEEE. pp. 324–331. doi:10.1109/CIDM.2009.4938667. Wulff, P., Mientus, L., Nowak, A., Borowski, A.,

work page doi:10.1109/cidm.2009.4938667 2009
[53]

International Journal of Artificial Intelligence in Education 33, 439–466

Utilizing a pretrained language model (bert) to classify preservice physics teachers’ written re- flections. International Journal of Artificial Intelligence in Education 33, 439–466. doi:10.1007/s40593-022-00290-6. Yang, Z.G., Dod´ e, R., Ferenczi, G., H´ eja, E., Jelencsik-M´ atyus, K., K˝ or¨ os, ´A., Laki, L.J., Ligeti-Nagy, N., Vad´ asz, N., V´ aradi, T.,

work page doi:10.1007/s40593-022-00290-6
[54]

Magyar Sz´ am´ ıt´ og´ epes Nyelv´ eszeti Konferencia (MSZNY 2023), Szegedi Tudom´ anyegyetem, Informatikai Int´ ezet, Szeged, Hungary

J¨ onnek a nagyok! BERT-large, GPT-2 ´ es GPT-3 nyelvmodellek magyar nyelvre, in: XIX. Magyar Sz´ am´ ıt´ og´ epes Nyelv´ eszeti Konferencia (MSZNY 2023), Szegedi Tudom´ anyegyetem, Informatikai Int´ ezet, Szeged, Hungary. pp. 247–262. Zaheer, M., Guruganesh, G., Dubey, A., Ainslie, J., Alberti, C., Ontanon, S., Pham, P., Ravula, A., Wang, Q., Yang, L., A...

2023
[55]

A., Ainslie, J., Alberti, C., Ontanon, S., Pham, P., Ravula, A., Wang, Q., Yang, L., et al

Big bird: Transformers for longer sequences. doi:10.48550/arXiv.2007.14062, arXiv:2007.14062. Zhang, C., Hofmann, F., Pl¨ oßl, L., Gl¨ aser-Zikuda, M.,

work page doi:10.48550/arxiv.2007.14062 2007
[56]

Education and Information Technologies 29, 21593–21619

Classifica- tion of reflective writing: A comparative analysis with shallow machine learning and pre-trained language models. Education and Information Technologies 29, 21593–21619. doi:10.1007/s10639-024-12720-0. Zhang, Y., Li, M., Long, D., Zhang, X., Lin, H., Yang, B., Xie, P., Yang, A., Liu, D., Lin, J., Huang, F., Zhou, J.,

work page doi:10.1007/s10639-024-12720-0
[57]

Qwen3 Embedding: Advancing Text Embedding and Reranking Through Foundation Models

Qwen3 embedding: Advancing text embedding and reranking through foundation models. doi:10.48550 /arXiv.2506.05176,arXiv:2506.05176. 30

work page internal anchor Pith review arXiv