Multi-Task Cross-Lingual Sequence Tagging from Scratch
read the original abstract
We present a deep hierarchical recurrent neural network for sequence tagging. Given a sequence of words, our model employs deep gated recurrent units on both character and word levels to encode morphology and context information, and applies a conditional random field layer to predict the tags. Our model is task independent, language independent, and feature engineering free. We further extend our model to multi-task and cross-lingual joint training by sharing the architecture and parameters. Our model achieves state-of-the-art results in multiple languages on several benchmark tasks including POS tagging, chunking, and NER. We also demonstrate that multi-task and cross-lingual joint training can improve the performance in various cases.
This paper has not been read by Pith yet.
Forward citations
Cited by 1 Pith paper
-
DropAttention: A Regularization Method for Fully-Connected Self-Attention Networks
DropAttention regularizes attention weights in fully-connected self-attention networks to reduce overfitting and improve performance.
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.