| Naslov: | Dataset of Uzbek base words : extraction and data analysis based on the school corpus |
|---|
| Avtorji: | ID Madatov, Khabibulla (Avtor) ID Khajibaeva, Surayyo (Avtor) ID Vičič, Jernej (Avtor) |
| Datoteke: | RAZ_Madatov_Khabibulla_2026.pdf (2,42 MB) MD5: E8695F5F3B4D7C0E7DBD187A6B04293A
https://www.sciencedirect.com/science/article/pii/S2590291126003141?via%3Dihub
|
|---|
| Jezik: | Angleški jezik |
|---|
| Vrsta gradiva: | Članek v reviji |
|---|
| Tipologija: | 1.01 - Izvirni znanstveni članek |
|---|
| Organizacija: | FAMNIT - Fakulteta za matematiko, naravoslovje in informacijske tehnologije
|
|---|
| Opis: | The article presents a dataset of Uzbek base words extracted from a purposefully prepared corpus using the Synonym Thesaurus Support method. This method identifies base words for each school-grade by analysing a large text corpus comprising 142 textbooks intended for school education in Uzbekistan. The definition of the base word used in this article and in the proposed dataset is a word within a synonymic series that: - is the most widely used. - is distinguished by semantic clarity and stability. - has stylistic neutrality. Based on the proposed approach, school textbooks were analysed by dividing them into Primary (school grades 1 - 4), Basic Secondary (school grades 5 - 9), and Secondary (school grades 10 - 11) blocks. Base words that stand out from the general corpus were identified for each school-grade. This method extracted new base words not found in previous school grades and specific to the observed grade. The main idea of the method is to extract base words from the lemma sset of each school-grade using a corpus of synonyms. This allows analysing the level of lexical complexity and class-specific vocabulary richness of texts intended for schoolchildren. The final results are lists of base words specifically extracted from primary (school-grades 1 - 4), basic secondary (school-grades 5 - 9), and secondary (school-grades 10 - 11) school texts; 17,599,48,203, and 20,491 base words, respectively. |
|---|
| Ključne besede: | school corpus, base word, basic vocabulary, Uzbek language |
|---|
| Verzija publikacije: | Objavljena publikacija |
|---|
| Datum objave: | 18.04.2026 |
|---|
| Leto izida: | 2026 |
|---|
| Št. strani: | str. 1-9 |
|---|
| Številčenje: | Vol. 13, article 102749 |
|---|
| PID: | 20.500.12556/RUP-23061  |
|---|
| UDK: | 81'322.4:811.5 |
|---|
| ISSN pri članku: | 2590-2911 |
|---|
| DOI: | 10.1016/j.ssaho.2026.102749  |
|---|
| COBISS.SI-ID: | 278726403  |
|---|
| Datum objave v RUP: | 20.05.2026 |
|---|
| Število ogledov: | 33 |
|---|
| Število prenosov: | 4 |
|---|
| Metapodatki: |  |
|---|
|
:
|
Kopiraj citat |
|---|
| | | | Skupna ocena: | (0 glasov) |
|---|
| Vaša ocena: | Ocenjevanje je dovoljeno samo prijavljenim uporabnikom. |
|---|
| Objavi na: |  |
|---|
Postavite miškin kazalec na naslov za izpis povzetka. Klik na naslov izpiše
podrobnosti ali sproži prenos. |