Lexical Complexity Prediction: An Overview

Kai North; Marcos Zampieri; Matthew Shardlow

arxiv: 2303.04851 · v1 · pith:AWMZB532new · submitted 2023-03-08 · 💻 cs.CL

Lexical Complexity Prediction: An Overview

Kai North , Marcos Zampieri , Matthew Shardlow This is my paper

classification 💻 cs.CL

keywords complexitylexicalpredictionapproachescomputationalenglishincludeoverview

0 comments

read the original abstract

The occurrence of unknown words in texts significantly hinders reading comprehension. To improve accessibility for specific target populations, computational modelling has been applied to identify complex words in texts and substitute them for simpler alternatives. In this paper, we present an overview of computational approaches to lexical complexity prediction focusing on the work carried out on English data. We survey relevant approaches to this problem which include traditional machine learning classifiers (e.g. SVMs, logistic regression) and deep neural networks as well as a variety of features, such as those inspired by literature in psycholinguistics as well as word frequency, word length, and many others. Furthermore, we introduce readers to past competitions and available datasets created on this topic. Finally, we include brief sections on applications of lexical complexity prediction, such as readability and text simplification, together with related studies on languages other than English.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

UOL@IDEM at BEA 2026 Shared Task 1: Neural Fusion and Feature-Rich Modeling for L1-Aware Vocabulary Difficulty Prediction
cs.CL 2026-06 unverdicted novelty 2.0

A feature-rich regression model using multilingual embeddings and features for frequency, cognate similarity, and predictability reports RMSE scores of 1.132, 1.037, and 0.891 for L1-aware vocabulary difficulty predic...