A Universal Part-of-Speech Tagset

Dipanjan Das; Ryan McDonald; Slav Petrov

arxiv: 1104.2086 · v1 · pith:IGH3CFB4new · submitted 2011-04-11 · 💻 cs.CL

A Universal Part-of-Speech Tagset

Slav Petrov , Dipanjan Das , Ryan McDonald This is my paper

classification 💻 cs.CL

keywords tagsetuniversalpart-of-speechdifferentinductionmappingtreebankunsupervised

0 comments

read the original abstract

To facilitate future research in unsupervised induction of syntactic structure and to standardize best-practices, we propose a tagset that consists of twelve universal part-of-speech categories. In addition to the tagset, we develop a mapping from 25 different treebank tagsets to this universal set. As a result, when combined with the original treebank data, this universal tagset and mapping produce a dataset consisting of common parts-of-speech for 22 different languages. We highlight the use of this resource via two experiments, including one that reports competitive accuracies for unsupervised grammar induction without gold standard part-of-speech tags.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Do Transformer Attention Heads Provide Transparency in Abstractive Summarization?
cs.CL 2019-07 unverdicted novelty 5.0

Analysis of transformer attention heads in abstractive summarization shows specialization in some heads and proposes a method to measure model reliance on learned attention distributions.