1. Dataset of sentiment tagged language resources for Macedonian languageSofija Kochovska, Jernej Vičič, Branko Kavšek, 2026, izvirni znanstveni članek Opis: Macedonian is a South Slavic language spoken by about 2 million people, primarily in North Macedonia and among diaspora communities worldwide. It’s known for a few distinctive features. Most notably, it uses definite articles attached to the end of nouns, for example, kniga (a book) becomes knigata (the book). Furthermore, it doesn’t use grammatical cases, which makes its grammar relatively straightforward compared to other Slavic languages. The dataset comprises two lists of sentiment annotated words that present the core of the Macedonian sentiment-annotated lexicon, a list of the stopwords, and a list of Affirmative and non-Affirmative words (AnAwords) composed mostly of intensifiers and diminishers, and a list of polarity shifters. The main usage of the presented materials is in rule-based sentiment analysis, but the usage of some of the lists can be much broader. Ključne besede: Macedonian language, sentiment analysis, sentiment lexicon, sentiment analys, rule-based methods, natural language processing, low-resource languages, AnA words, stopwords, intensifiers, diminishers, polarity shifters Objavljeno v RUP: 20.01.2026; Ogledov: 253; Prenosov: 2
Celotno besedilo (251,79 KB) Gradivo ima več datotek! Več... |
2. TF-IDF-based classification of Uzbek educational textsKhabibulla Madatov, Sapura Sattarova, Jernej Vičič, 2025, izvirni znanstveni članek Opis: This paper presents a baseline study on automatic Uzbek text classification. Uzbek is a morphologically rich and low-resource language, which makes reliable preprocessing and evaluation challenging. The approach integrates Term Frequency–Inverse Document Frequency (TF–IDF) representation with three conventional methods: linear regression (LR), k-Nearest Neighbors (k-NN), and cosine similarity (CS, implemented as a 1-NN retrieval model). The objective is to categorize school learning materials by grade level (grades 5–11) to support improved alignment between curricular texts and students’ intellectual development. A balanced dataset of Uzbek school textbooks across different subjects was constructed, preprocessed with standard NLP tools, and converted into TF–IDF vectors. Experimental results on the internal test set of 70 files show that LR achieved 92.9% accuracy (precision = 0.94, recall = 0.93, F1 = 0.93), while CS performed comparably with 91.4% accuracy (precision = 0.92, recall = 0.91, F1 = 0.92). In contrast, k-NN obtained only 28.6% accuracy, confirming its weakness in high-dimensional sparse feature spaces. External evaluation on seven Uzbek literary works further demonstrated that LR and CS yielded consistent and interpretable grade-level mappings, whereas k-NN results were unstable. Overall, the findings establish reliable baselines for Uzbek educational text classification and highlight the potential of extending beyond lexical overlap toward semantically richer models in future work. Ključne besede: Uzbek language, text classification, low-resource languages, TF-IDF, cosine similarity, linear regression, k-Nearest Neighbors Objavljeno v RUP: 17.10.2025; Ogledov: 390; Prenosov: 4
Celotno besedilo (286,87 KB) Gradivo ima več datotek! Več... |
3. Self-compassion around the world : measurement invariance of the Short form of the self-compassion scale (SCS-SF) across 65 nations, 40 languages, gender identities, and age groupsViren Swami, Ulrich S. Tran, Martin Voracek, Toivo Aavik, Hamed Abdollahpour Ranjbar, Sulaiman Olanrewaju Adebayo, Reza Afhami, Oli Ahmed, Annie Aimé, Marwan Akel, Mirjam Koprivnik, Vita Poštuvan, 2025, izvirni znanstveni članek Opis: Objectives The 12-item Self-Compassion Scale–Short Form (SCS–SF) is a widely used instrument for the assessment of self-compassion. To date, there have been few examinations of this instrument’s psychometric properties, particularly across nations and languages. Therefore, we used data from the Body Image in Nature Survey (BINS) to assess measurement invari- ance of the SCS–SF across nations, languages, gender identities, and age groups. Methods Participants (N = 56,968) from 65 nations completed the SCS–SF in 40 languages. Using these data, we tested various hypothesised models of the SCS–SF in the total sample and, using multi-group confirmatory factor analysis, tested for invariance of the optimal model across national groups, languages, gender identities, and age groups. Results In the total dataset, we found that an 11-item, 2-factor model (i.e., SCS-11) provided best fit to the data, with the two factors tapping distinct constructs of compassionate and uncompassionate self-responding. The SCS-11 was found to be partially scalar invariant across national groups and languages, and fully scalar invariant across gender identities and age groups. There was wide variation in latent means for the two factors, particularly across national groups and languages. Further analyses showed negligible associations between the two factors and sociodemographic variables, including marital status, financial security, and urbanicity. Conclusions Our results suggest that it may be possible to derive a stable 2-factor model of the SCS–SF for use in cross- cultural research, but also highlight the likelihood of cross-national and cross-linguistic variations in the way that self- compassion is understood. Ključne besede: Self-compassion scale - short form (SCS–SF), 65 nations, 40 languages Objavljeno v RUP: 12.09.2025; Ogledov: 524; Prenosov: 3
Celotno besedilo (1,44 MB) Gradivo ima več datotek! Več... |
4. |
5. |
6. |
7. |
8. |