Lupa

Show document Help

A- | A+ | Print
Title:Dataset of vocabulary in Uzbek primary education : extraction and analysis in case of the school corpus
Authors:ID Madatov, Khabibulla (Author)
ID Sattarova, Sapura (Author)
ID Vičič, Jernej (Author)
Files:.pdf RAZ_Madatov_Khabibulla_2025.pdf (342,87 KB)
MD5: B099D0590099A4FB7D1438D190B9CE01
 
URL https://www.sciencedirect.com/science/article/pii/S2352340925000812
 
Language:English
Work type:Article
Typology:1.01 - Original Scientific Article
Organization:FAMNIT - Faculty of Mathematics, Science and Information Technologies
Abstract:The main goal of this research work is to determine the number of new words that a primary school pupil should know/acquire during each academic year. To accomplish this, we have created two datasets. The first dataset was compiled based on the "Explanatory Vocabulary of the Uzbek Language" (EDUL). The second dataset was created from 35 primary school textbooks for grades 1-4 approved by the Ministry of Preschool and School Education of the Republic of Uzbekistan, and it was named the "Uzbek Primary School Corpus" (UPSC) by authors. Using the "Comparative Lemma Extraction Method" (CLEM) proposed by the authors of the article, a vocabulary for grades 1-4 was created, and the problem of determining the number of new words (disregarding word forms as Uzbek is a morphologically rich language) that primary school pupils should learn each academic year was solved.
Keywords:Uzbek language, primary school, corpus construction, natural language processing (NLP), comparative Lemma extraction method
Publication date:03.02.2025
Year of publishing:2025
Number of pages:str. 1-12
Numbering:Vol. 59, article 111349
PID:20.500.12556/RUP-21537 This link opens in a new window
UDC:004.65:811.5
ISSN on article:2352-3409
DOI:10.1016/j.dib.2025.111349 This link opens in a new window
COBISS.SI-ID:225129475 This link opens in a new window
Publication date in RUP:08.08.2025
Views:498
Downloads:3
Metadata:XML DC-XML DC-RDF
:
Copy citation
  
Average score:(0 votes)
Your score:Voting is allowed only for logged in users.
Share:Bookmark and Share


Hover the mouse pointer over a document title to show the abstract or click on the title to get all document metadata.

Record is a part of a journal

Title:Data in brief
Publisher:Elsevier
ISSN:2352-3409
COBISS.SI-ID:32117977 This link opens in a new window

Document is financed by a project

Funder:EC - European Commission
Project number:739574
Name:Renewable materials and healthy environments research and innovation centre of excellence
Acronym:InnoRenew CoE

Funder:EC - European Commission
Project number:610170-EPP-1-2019-1-ES-EPPKA2-CBHE-JP
Name:Establishment of training and research centers and Courses development on Intelligent BigData Analysis in CA

Licences

License:CC BY 4.0, Creative Commons Attribution 4.0 International
Link:http://creativecommons.org/licenses/by/4.0/
Description:This is the standard Creative Commons license that gives others maximum freedom to do what they want with the work as long as they credit the author.

Secondary language

Language:Slovenian
Keywords:uzbeški jezik, osnovna šola, konstrukcija korpusa, obdelava naravnega jezika (NLP), metoda primerjalne ekstrakcije lem


Comments

Leave comment

You must log in to leave a comment.

Comments (0)
0 - 0 / 0
 
There are no comments!

Back
Logos of partners University of Maribor University of Ljubljana University of Primorska University of Nova Gorica