Set-Valued prediction for Part-of-Speech tagging / Stefan Heid ; 1. Reviewer Prof. Dr. Eyke Hüllermeier, 2. Reviewer Prof. Dr. Michaela Geierhos

Titelaufnahme

Titel
Set-Valued prediction for Part-of-Speech tagging / Stefan Heid ; 1. Reviewer Prof. Dr. Eyke Hüllermeier, 2. Reviewer Prof. Dr. Michaela Geierhos
Autor
Heid, Stefan
Beteiligte
Hüllermeier, Eyke ; Geierhos, Michaela
Erschienen
Paderborn, 2019
Umfang
1 Online-Ressource (xi, 72 Seiten) : Diagramme
Hochschulschrift
Universität Paderborn, Masterarbeit, 2019
Anmerkung
Tag der Abgabe: 02.12.2019
Datum der Abgabe
2.12.2019
Sprache
Englisch
Dokumenttyp
Masterarbeit
URN
urn:nbn:de:hbz:466:2-37152
DOI
10.17619/UNIPB/1-957

Links

Dateien

Klassifikation

Abstract

Part-of-speech ()-tagging is a method to predict a sequence of word classes given a sequence of words. Set-valued prediction can be used to allow a classifier to make restrained predictions in the face of uncertainty. In this thesis, we present a method for combining set-valued prediction with part of speech tagging to retrieve more reasonable predictions on difficult data. The set size allows the tagger to express its uncertainty of a specific prediction. The devised method can be applied to any -tagger capable of predicting a posterior distribution over the tags and provides set-valued predictions in a post-processing step. We implemented the method using the state-of-the-art tagger ore as our basis. The tagger is tuned to a diachronic corpus of Middle Lower German () that spans a wide spacial area. Because the corpus also captures human annotator uncertainty, special performance measures have been devised to properly evaluate the tagging performance. The resulting algorithm clearly outperforms our baseline in all considered measures. Our evaluation proves that set-valued prediction can give good predictions with utilities outperforming the accuracy score by large margins. This is especially shown in robustness tests that are difficult for the classifier. Results are compared against a baseline tagger, which profits even more from set-valued prediction.

Inhalt

Statistik

Lizenz-/Rechtehinweis