pith. machine review for the scientific record. sign in

arxiv: 1704.04859 · v2 · submitted 2017-04-17 · 💻 cs.CL

Recognition: unknown

Learning Character-level Compositionality with Visual Features

Authors on Pith no claims yet
classification 💻 cs.CL
keywords visualcharactercharacter-levelcharacterscompositionalitycreatingmodeldemonstrate
0
0 comments X
read the original abstract

Previous work has modeled the compositionality of words by creating character-level models of meaning, reducing problems of sparsity for rare words. However, in many writing systems compositionality has an effect even on the character-level: the meaning of a character is derived by the sum of its parts. In this paper, we model this effect by creating embeddings for characters based on their visual characteristics, creating an image for the character and running it through a convolutional neural network to produce a visual character embedding. Experiments on a text classification task demonstrate that such model allows for better processing of instances with rare characters in languages such as Chinese, Japanese, and Korean. Additionally, qualitative analyses demonstrate that our proposed model learns to focus on the parts of characters that carry semantic content, resulting in embeddings that are coherent in visual space.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. MIXAR: Scaling Autoregressive Pixel-based Language Models to Multiple Languages and Scripts

    cs.CL 2026-04 unverdicted novelty 7.0

    MIXAR is the first autoregressive pixel-based language model for eight languages and scripts, with empirical gains on multilingual tasks, robustness to unseen languages, and further improvements when scaled to 0.5B pa...