Lupa

Search the repository Help

A- | A+ | Print
Query: search in
search in
search in
search in
* old and bologna study programme

Options:
  Reset


1 - 2 / 2
First pagePrevious page1Next pageLast page
1.
Dataset of sentiment tagged language resources for Macedonian language
Sofija Kochovska, Jernej Vičič, Branko Kavšek, 2026, original scientific article

Abstract: Macedonian is a South Slavic language spoken by about 2 million people, primarily in North Macedonia and among diaspora communities worldwide. It’s known for a few distinctive features. Most notably, it uses definite articles attached to the end of nouns, for example, kniga (a book) becomes knigata (the book). Furthermore, it doesn’t use grammatical cases, which makes its grammar relatively straightforward compared to other Slavic languages. The dataset comprises two lists of sentiment annotated words that present the core of the Macedonian sentiment-annotated lexicon, a list of the stopwords, and a list of Affirmative and non-Affirmative words (AnAwords) composed mostly of intensifiers and diminishers, and a list of polarity shifters. The main usage of the presented materials is in rule-based sentiment analysis, but the usage of some of the lists can be much broader.
Keywords: Macedonian language, sentiment analysis, sentiment lexicon, sentiment analys, rule-based methods, natural language processing, low-resource languages, AnA words, stopwords, intensifiers, diminishers, polarity shifters
Published in RUP: 20.01.2026; Views: 214; Downloads: 2
.pdf Full text (251,79 KB)
This document has more files! More...

2.
Dataset of vocabulary in Uzbek primary education : extraction and analysis in case of the school corpus
Khabibulla Madatov, Sapura Sattarova, Jernej Vičič, 2025, original scientific article

Abstract: The main goal of this research work is to determine the number of new words that a primary school pupil should know/acquire during each academic year. To accomplish this, we have created two datasets. The first dataset was compiled based on the "Explanatory Vocabulary of the Uzbek Language" (EDUL). The second dataset was created from 35 primary school textbooks for grades 1-4 approved by the Ministry of Preschool and School Education of the Republic of Uzbekistan, and it was named the "Uzbek Primary School Corpus" (UPSC) by authors. Using the "Comparative Lemma Extraction Method" (CLEM) proposed by the authors of the article, a vocabulary for grades 1-4 was created, and the problem of determining the number of new words (disregarding word forms as Uzbek is a morphologically rich language) that primary school pupils should learn each academic year was solved.
Keywords: Uzbek language, primary school, corpus construction, natural language processing (NLP), comparative Lemma extraction method
Published in RUP: 08.08.2025; Views: 567; Downloads: 7
.pdf Full text (342,87 KB)
This document has more files! More...

Search done in 0 sec.
Back to top
Logos of partners University of Maribor University of Ljubljana University of Primorska University of Nova Gorica