AINL-2026

A Multi-head-based architecture for effective morphological tagging in Russian with open dictionary

Скибин К., Пожидаев М., Сущенко С.

AINL-2026, Томск, 2026 г.

Keywords: Morphological tagging, Multi-head attention, Transformer, Embedding, Positional encoding

The article proposes a new architecture based on Multi-head attention to solve the problem of morphological tagging for the Russian language. The preprocessing of the word vectors includes splitting the words into subtokens, followed by a trained procedure for aggregating the vectors of the subtokens into vectors for tokens. This allows to support an open dictionary and analyze morphological features taking into account parts of words (prefixes, endings, etc.). The open dictionary allows in future to analyze words that are absent in the training dataset.

The performed computational experiment on the Sintagrus and Taiga datasets shows that for some grammatical categories the proposed architecture gives accuracy 98–99% and above, which outperforms previously known results. For nine out of ten words, the architecture precisely predicts all grammatical categories and indicates when the categories must not be analyzed for the word.

At the same time, the model based on the proposed architecture can be trained on consumer-level graphics accelerators, retains all the advantages of Multi-head attention over RNNs (RNNs are not used in the proposed approach), does not require pre-training on large collections of unlabeled texts (like BERT), and shows higher processing speed than previous results.

Introduction

Morphological tagging is the process of automatically identifying and classifying grammatical characteristics of words in a text. It remains one of the key tasks in the field of computational linguistics and its application in software engineering. The successful solution of this problem can have great importance both in fundamental linguistics, because in many cases it turns out to be inextricably related to the solution of the problem of homonymy removal, and to a number of purely practical applications, including machine translation, sentiment analysis, information extraction, identification of linguistic patterns, etc., because it allows to clarify the meaning and content of the text. The problem of homonymy is particularly acute in creating audiobooks, since the ambiguity of grammatical attributes leads to frequent errors in stress positioning.

Solving this problem by a human often involves understanding the meaning of a text, because it often requires determining the grammatical characteristics of fictional words that are absent in dictionaries. Such a task arises when processing fiction texts in the fantasy genre, which are characterized by the authors' desire to build fictional worlds. The possibility of processing fictional words will be further denoted by the term <>.

The development of AI technologies can make a new significant contribution to solving this problem. In this work, special attention is paid to the idea of using the multi-head attention (MHA) architecture (Vaswani, 2017 cite{transformer}) in its plain form, i.e. without building a transformer for training on large texts collections, which can be observed in previous solutions based on the BERT model (Devlin, 2019 cite{bert}). The use of a plain variant of the MHA architecture makes it possible to significantly reduce the need for computing resources that can be faced when using BERT, make it possible to train a custom model on user data using consumer-level graphics accelerators. This also provides support for an open dictionary without noticeable losses in the quality of solutions compared to the best results obtained for the Russian language.