Automatic Extraction of Lithuanian Cybersecurity Terms Using Deep Learning Approaches
Abstract
The paper presents the results of research on deep learning methods
aiming to determine the most effective one for automatic extraction of Lithuanian
terms from a specialized domain (cybersecurity) with very restricted resources. A
semi-supervised approach to deep learning was chosen for the research as
Lithuanian is a less resourced language and large amounts of data, necessary for
unsupervised methods, are not available in the selected domain. The findings of the
research show that Bi-LSTM network with Bidirectional Encoder Representations
from Transformers (BERT) can achieve close to state-of-the-art results.