1.
Dataset of sentiment tagged language resources for Macedonian languageSofija Kochovska,
Jernej Vičič,
Branko Kavšek, 2026, original scientific article
Abstract: Macedonian is a South Slavic language spoken by about 2 million people, primarily in North Macedonia and among diaspora communities worldwide. It’s known for a few distinctive features. Most notably, it uses definite articles attached to the end of nouns, for example, kniga (a book) becomes knigata (the book). Furthermore, it doesn’t use grammatical cases, which makes its grammar relatively straightforward compared to other Slavic languages. The dataset comprises two lists of sentiment annotated words that present the core of the Macedonian sentiment-annotated lexicon, a list of the stopwords, and a list of Affirmative and non-Affirmative words (AnAwords) composed mostly of intensifiers and diminishers, and a list of polarity shifters. The main usage of the presented materials is in rule-based sentiment analysis, but the usage of some of the lists can be much broader.
Keywords: Macedonian language, sentiment analysis, sentiment lexicon, sentiment analys, rule-based methods, natural language processing, low-resource languages, AnA words, stopwords, intensifiers, diminishers, polarity shifters
Published in RUP: 20.01.2026; Views: 214; Downloads: 2
Full text (251,79 KB)
This document has more files! More...