Lupa

Iskanje po repozitoriju Pomoč

A- | A+ | Natisni
Iskalni niz: išči po
išči po
išči po
išči po
* po starem in bolonjskem študiju

Opcije:
  Ponastavi


1 - 2 / 2
Na začetekNa prejšnjo stran1Na naslednjo stranNa konec
1.
Dataset of Uzbek base words : extraction and data analysis based on the school corpus
Khabibulla Madatov, Surayyo Khajibaeva, Jernej Vičič, 2026, izvirni znanstveni članek

Opis: The article presents a dataset of Uzbek base words extracted from a purposefully prepared corpus using the Synonym Thesaurus Support method. This method identifies base words for each school-grade by analysing a large text corpus comprising 142 textbooks intended for school education in Uzbekistan. The definition of the base word used in this article and in the proposed dataset is a word within a synonymic series that: - is the most widely used. - is distinguished by semantic clarity and stability. - has stylistic neutrality. Based on the proposed approach, school textbooks were analysed by dividing them into Primary (school grades 1 - 4), Basic Secondary (school grades 5 - 9), and Secondary (school grades 10 - 11) blocks. Base words that stand out from the general corpus were identified for each school-grade. This method extracted new base words not found in previous school grades and specific to the observed grade. The main idea of the method is to extract base words from the lemma sset of each school-grade using a corpus of synonyms. This allows analysing the level of lexical complexity and class-specific vocabulary richness of texts intended for schoolchildren. The final results are lists of base words specifically extracted from primary (school-grades 1 - 4), basic secondary (school-grades 5 - 9), and secondary (school-grades 10 - 11) school texts; 17,599,48,203, and 20,491 base words, respectively.
Ključne besede: school corpus, base word, basic vocabulary, Uzbek language
Objavljeno v RUP: 20.05.2026; Ogledov: 196; Prenosov: 9
.pdf Celotno besedilo (2,42 MB)
Gradivo ima več datotek! Več...

2.
Dataset of vocabulary in Uzbek primary education : extraction and analysis in case of the school corpus
Khabibulla Madatov, Sapura Sattarova, Jernej Vičič, 2025, izvirni znanstveni članek

Opis: The main goal of this research work is to determine the number of new words that a primary school pupil should know/acquire during each academic year. To accomplish this, we have created two datasets. The first dataset was compiled based on the "Explanatory Vocabulary of the Uzbek Language" (EDUL). The second dataset was created from 35 primary school textbooks for grades 1-4 approved by the Ministry of Preschool and School Education of the Republic of Uzbekistan, and it was named the "Uzbek Primary School Corpus" (UPSC) by authors. Using the "Comparative Lemma Extraction Method" (CLEM) proposed by the authors of the article, a vocabulary for grades 1-4 was created, and the problem of determining the number of new words (disregarding word forms as Uzbek is a morphologically rich language) that primary school pupils should learn each academic year was solved.
Ključne besede: Uzbek language, primary school, corpus construction, natural language processing (NLP), comparative Lemma extraction method
Objavljeno v RUP: 08.08.2025; Ogledov: 986; Prenosov: 8
.pdf Celotno besedilo (342,87 KB)
Gradivo ima več datotek! Več...

Iskanje izvedeno v 0.01 sek.
Na vrh
Logotipi partnerjev Univerza v Mariboru Univerza v Ljubljani Univerza na Primorskem Univerza v Novi Gorici