Development of a Spanish Version of the Xerox Tagger

Amalio F. Nieto Serrano (Departamento de Ingenier\'ia de Sistemas Telem\'aticos; Escuela Superior de Ingenieros de Telecomunicaciones; Facultad de Filosof\'ia y Letras; Fernando S\'anchez Le\'on (Laboratorio de Ling\"u\'istica Inform\'atica; Universidad Aut\'onoma de Madrid); Universidad Polit\'ecnica de Madrid)

arxiv: cmp-lg/9505035 · v1 · pith:HX7SVSQNnew · submitted 1995-05-19 · cmp-lg · cs.CL

Development of a Spanish Version of the Xerox Tagger

Fernando S\'anchez Le\'on (Laboratorio de Ling\"u\'istica Inform\'atica , Facultad de Filosof\'ia y Letras , Universidad Aut\'onoma de Madrid) , Amalio F. Nieto Serrano (Departamento de Ingenier\'ia de Sistemas Telem\'aticos , Escuela Superior de Ingenieros de Telecomunicaciones , Universidad Polit\'ecnica de Madrid) This is my paper

classification cmp-lg cs.CL

keywords spanishtaggercorpusmodelorderperformedpresentedsome

0 comments

read the original abstract

This paper describes work performed withing the CRATER ({\em C}orpus {\em R}esources {\em A}nd {\em T}erminology {\em E}xt{\em R}action, MLAP-93/20) project, funded by the Commission of the European Communities. In particular, it addresses the issue of adapting the Xerox Tagger to Spanish in order to tag the Spanish version of the ITU (International Telecommunications Union) corpus. The model implemented by this tagger is briefly presented along with some modifications performed on it in order to use some parameters not probabilistically estimated. Initial decisions, like the tagset, the lexicon and the training corpus are also discussed. Finally, results are presented and the benefits of the {\em mixed model} justified.

This paper has not been read by Pith yet.

Development of a Spanish Version of the Xerox Tagger

discussion (0)