• English
    • Lietuvių
  • English 
    • English
    • Lietuvių
  • Login
View Item 
  •   Home
  • Knygos, straipsniai ir mokslinių konferencijų medžiaga / Books, Articles and Conference materials
  • Mokslinių konferencijų medžiaga / Conference materials
  • View Item
  •   Home
  • Knygos, straipsniai ir mokslinių konferencijų medžiaga / Books, Articles and Conference materials
  • Mokslinių konferencijų medžiaga / Conference materials
  • View Item
JavaScript is disabled for your browser. Some features of this site may not work without it.

Accuracy of Slovak Language Lemmatization and MSD Tagging – MorphoDiTa and SpaCy

Thumbnail
Download
LLOD_2022 Book of Abstracts-94-96.pdf (61.78Kb)
Date
2022
Author
Garabík, Radovan
Mitana, Denis
Metadata
Show full item record
Abstract
The Slovak language, as a “typical” Slavic language, belongs to the group of moderately inflected languages, with three or four genders, two grammatical numbers, all interacting with the inflections in somewhat complicated and unpredictable ways. The inflections are realized primarily by suffixes, but with many irregularities; one suffix encodes several relevant grammatical categories and the same suffix often reflects unrelated features in other words, a typical inflectional language not amenable to a heuristic analysis. Following these limitations, lemmatization is often an indispensable step in all kinds of text processing (starting with full-text search), and full morphosyntactic analysis or description (MSD) is the core of corpus linguistic research. Given the core importance of lemmatization and MSD in Slovak corpus linguistics, it is important to realize its limitations and recognize achievable accuracy. Since modern approaches aim to utilize deep learning and huge language models, we evaluate the accuracy of lemmatization + MSD in several common usage scenarios by comparing the state-of-the-art “classical” lemmatizer and MSD tagger MorhoDiTa, based on perceptron; and spaCy, using a multilingual BERT language model.
URI
https://repository.mruni.eu/handle/007/18680
Collections
  • Mokslinių konferencijų medžiaga / Conference materials [645]

DSpace software copyright © 2002-2016  DuraSpace
Contact Us | Send Feedback
Theme by 
Atmire NV
 

 

Browse

All of DSpaceCommunities & CollectionsBy Issue DateAuthorsTitlesSubjectsThis CollectionBy Issue DateAuthorsTitlesSubjects

My Account

LoginRegister

DSpace software copyright © 2002-2016  DuraSpace
Contact Us | Send Feedback
Theme by 
Atmire NV