Unsupervised Keyword Extraction from Polish Legal Texts

Micha{\l} Jungiewicz; Micha{\l} {\L}opuszy\'nski

arxiv: 1408.3731 · v2 · pith:JYANT65Snew · submitted 2014-08-16 · 💻 cs.CL

Unsupervised Keyword Extraction from Polish Legal Texts

Micha{\l} Jungiewicz , Micha{\l} {\L}opuszy\'nski This is my paper

classification 💻 cs.CL

keywords rakealgorithmdomainextractionkeywordlegalmethodnon-content

0 comments

read the original abstract

In this work, we present an application of the recently proposed unsupervised keyword extraction algorithm RAKE to a corpus of Polish legal texts from the field of public procurement. RAKE is essentially a language and domain independent method. Its only language-specific input is a stoplist containing a set of non-content words. The performance of the method heavily depends on the choice of such a stoplist, which should be domain adopted. Therefore, we complement RAKE algorithm with an automatic approach to selecting non-content words, which is based on the statistical properties of term distribution.

This paper has not been read by Pith yet.

Unsupervised Keyword Extraction from Polish Legal Texts

discussion (0)