Lupa

Search the repository Help

A- | A+ | Print
Query: search in
search in
search in
search in
* old and bologna study programme

Options:
  Reset


1 - 6 / 6
First pagePrevious page1Next pageLast page
1.
Dataset of vocabulary in Uzbek primary education : extraction and analysis in case of the school corpus
Khabibulla Madatov, Sapura Sattarova, Jernej Vičič, 2025, original scientific article

Abstract: The main goal of this research work is to determine the number of new words that a primary school pupil should know/acquire during each academic year. To accomplish this, we have created two datasets. The first dataset was compiled based on the "Explanatory Vocabulary of the Uzbek Language" (EDUL). The second dataset was created from 35 primary school textbooks for grades 1-4 approved by the Ministry of Preschool and School Education of the Republic of Uzbekistan, and it was named the "Uzbek Primary School Corpus" (UPSC) by authors. Using the "Comparative Lemma Extraction Method" (CLEM) proposed by the authors of the article, a vocabulary for grades 1-4 was created, and the problem of determining the number of new words (disregarding word forms as Uzbek is a morphologically rich language) that primary school pupils should learn each academic year was solved.
Keywords: Uzbek language, primary school, corpus construction, natural language processing (NLP), comparative Lemma extraction method
Published in RUP: 08.08.2025; Views: 630; Downloads: 7
.pdf Full text (342,87 KB)
This document has more files! More...

2.
Dataset of Uzbek verbs with formation and suffixes
Maksud Sharipov, Jernej Vičič, 2025, other scientific articles

Abstract: The main goal of this work is to create a dataset of Uzbek language verbs. This dataset stores information about which words verbs are derived from and with which affixes. The affixes are classified into distinct categories. With the help of this dataset, it is possible to determine from which parts of speech each Uzbek verb is derived and with which affixes. It also plays a key role in identifying verbs in Uzbek language texts and developing rule-based models for their analysis. Additionally, this dataset plays a key role in building various artificial intelligence models for the morphological and syntactic analysis of Uzbek language texts. Verbs play a crucial role in learning any language; therefore, students in schools and higher education institutions can also use this dataset during the learning process. The obtained dataset serves as a valuable resource for researchers and practitioners interested in Uzbek language processing tasks.
Keywords: verb phrase, Uzbek language, Uzbek web corpus, verb form, verb affixes
Published in RUP: 02.06.2025; Views: 884; Downloads: 13
.pdf Full text (281,07 KB)
This document has more files! More...

3.
4.
5.
Using the ToBI transcription to record the intonation of Slovene
Jana Volk, 2012, original scientific article

Abstract: The paper presents ToBI, a transcription method for prosodic annotation. ToBI is an acronym for Tones and Breaks Indices which first denoted an intonation system developed in the 1990s for annotating intonation and prosody in the database of spoken Mainstream American English. The MAE_ToBI transcription originally consists of six parts - the audio recording of the utterance, the fundamental frequency contour and four parallel tiers for the transcription of tone sequence, ortographic transcription, indication of break indices between words and for additional observations. The core of the transcription, i. e. of the phonological analyses of the intonation pattern, is represented by the tone tier where tonal variation is transcribed by using labels for high tone and low tone where a tone can appear as a pitch accent, phrase accentand boundary tone. Due to its simplicity and flexibility, the system soon began to be used for the prosodic annotation of other variants of English and many other languages, as well as in different non-linguistic fields, leading to the creation of many new ToBI systems adapted to individual languages and dialects. The author is the first to use this method for Slovene, more precisely, for the intonational transcription and analysis of the corpus of spontaneous speech of Slovene Istria, in order to investigate if the ToBi system is useful for the annotation of Slovene and its regional variants.
Keywords: discours spontané, intonacija, intonation, Istrie slovene, korpusno jezikoslovje, linguistique de corpus, méthode de transcription ToBI, slovene, slovenska Istra, slovenščina, spontani govor, ToBI, Tones and Breaks Indices, transkripcijska metoda ToBI
Published in RUP: 30.12.2015; Views: 5202; Downloads: 75
URL Link to full text

6.
Search done in 0 sec.
Back to top
Logos of partners University of Maribor University of Ljubljana University of Primorska University of Nova Gorica