Lupa

Iskanje po repozitoriju Pomoč

A- | A+ | Natisni
Iskalni niz: išči po
išči po
išči po
išči po
* po starem in bolonjskem študiju

Opcije:
  Ponastavi


1 - 10 / 87
Na začetekNa prejšnjo stran123456789Na naslednjo stranNa konec
1.
Dataset of sentiment tagged language resources for Macedonian language
Sofija Kochovska, Jernej Vičič, Branko Kavšek, 2026, izvirni znanstveni članek

Opis: Macedonian is a South Slavic language spoken by about 2 million people, primarily in North Macedonia and among diaspora communities worldwide. It’s known for a few distinctive features. Most notably, it uses definite articles attached to the end of nouns, for example, kniga (a book) becomes knigata (the book). Furthermore, it doesn’t use grammatical cases, which makes its grammar relatively straightforward compared to other Slavic languages. The dataset comprises two lists of sentiment annotated words that present the core of the Macedonian sentiment-annotated lexicon, a list of the stopwords, and a list of Affirmative and non-Affirmative words (AnAwords) composed mostly of intensifiers and diminishers, and a list of polarity shifters. The main usage of the presented materials is in rule-based sentiment analysis, but the usage of some of the lists can be much broader.
Ključne besede: Macedonian language, sentiment analysis, sentiment lexicon, sentiment analys, rule-based methods, natural language processing, low-resource languages, AnA words, stopwords, intensifiers, diminishers, polarity shifters
Objavljeno v RUP: 20.01.2026; Ogledov: 47; Prenosov: 2
.pdf Celotno besedilo (251,79 KB)
Gradivo ima več datotek! Več...

2.
TF-IDF-based classification of Uzbek educational texts
Khabibulla Madatov, Sapura Sattarova, Jernej Vičič, 2025, izvirni znanstveni članek

Opis: This paper presents a baseline study on automatic Uzbek text classification. Uzbek is a morphologically rich and low-resource language, which makes reliable preprocessing and evaluation challenging. The approach integrates Term Frequency–Inverse Document Frequency (TF–IDF) representation with three conventional methods: linear regression (LR), k-Nearest Neighbors (k-NN), and cosine similarity (CS, implemented as a 1-NN retrieval model). The objective is to categorize school learning materials by grade level (grades 5–11) to support improved alignment between curricular texts and students’ intellectual development. A balanced dataset of Uzbek school textbooks across different subjects was constructed, preprocessed with standard NLP tools, and converted into TF–IDF vectors. Experimental results on the internal test set of 70 files show that LR achieved 92.9% accuracy (precision = 0.94, recall = 0.93, F1 = 0.93), while CS performed comparably with 91.4% accuracy (precision = 0.92, recall = 0.91, F1 = 0.92). In contrast, k-NN obtained only 28.6% accuracy, confirming its weakness in high-dimensional sparse feature spaces. External evaluation on seven Uzbek literary works further demonstrated that LR and CS yielded consistent and interpretable grade-level mappings, whereas k-NN results were unstable. Overall, the findings establish reliable baselines for Uzbek educational text classification and highlight the potential of extending beyond lexical overlap toward semantically richer models in future work.
Ključne besede: Uzbek language, text classification, low-resource languages, TF-IDF, cosine similarity, linear regression, k-Nearest Neighbors
Objavljeno v RUP: 17.10.2025; Ogledov: 330; Prenosov: 3
.pdf Celotno besedilo (286,87 KB)
Gradivo ima več datotek! Več...

3.
Is open source the future of AI? : a data-driven approach
Domen Vake, Bogdan Šinik, Jernej Vičič, Aleksandar Tošić, 2025, izvirni znanstveni članek

Opis: Large language models (LLMs) have become central to both academic research and industrial applications, fueling debates on their accuracy, usability, privacy, and potential misuse. While proprietary models benefit from substantial investments in data and computing resources, open-sourcing is often suggested as a means to enhance trust and transparency. Yet, open-sourcing comes with its own challenges, such as risks of illicit applications, limited financial incentives, and intellectual property concerns. Positioned between these extremes are hybrid approaches—including partially open models and licensing restrictions—that aim to balance openness with control. In this paper, we adopt a data-driven approach to examine the open-source development of LLMs. By analyzing contributions in model improvements, modifications, and methodologies, we assess how community efforts impact model performance. Our findings indicate that the open-source community can significantly enhance models, demonstrating that community-driven modifications can yield efficiency gains without compromising performance. Moreover, our analysis reveals distinct trends in community growth and highlights which architectures benefit disproportionately from open-source engagement. These insights provide an empirical foundation to inform balanced discussions among industry experts and policymakers on the future direction of AI development.
Ključne besede: large language models, artificial intelligence, open source, data science, HuggingFace
Objavljeno v RUP: 25.09.2025; Ogledov: 521; Prenosov: 4
.pdf Celotno besedilo (606,12 KB)
Gradivo ima več datotek! Več...

4.
Nadaljevanje besedil z enostavnimi nevronskimi mrežami : zaključna naloga
Žan Škrabl, 2025, diplomsko delo

Ključne besede: nevronske mreže, nadaljevanje besedil, LSTM, zaključne naloge
Objavljeno v RUP: 06.09.2025; Ogledov: 394; Prenosov: 0
.pdf Celotno besedilo (986,48 KB)

5.
Dataset of vocabulary in Uzbek primary education : extraction and analysis in case of the school corpus
Khabibulla Madatov, Sapura Sattarova, Jernej Vičič, 2025, izvirni znanstveni članek

Opis: The main goal of this research work is to determine the number of new words that a primary school pupil should know/acquire during each academic year. To accomplish this, we have created two datasets. The first dataset was compiled based on the "Explanatory Vocabulary of the Uzbek Language" (EDUL). The second dataset was created from 35 primary school textbooks for grades 1-4 approved by the Ministry of Preschool and School Education of the Republic of Uzbekistan, and it was named the "Uzbek Primary School Corpus" (UPSC) by authors. Using the "Comparative Lemma Extraction Method" (CLEM) proposed by the authors of the article, a vocabulary for grades 1-4 was created, and the problem of determining the number of new words (disregarding word forms as Uzbek is a morphologically rich language) that primary school pupils should learn each academic year was solved.
Ključne besede: Uzbek language, primary school, corpus construction, natural language processing (NLP), comparative Lemma extraction method
Objavljeno v RUP: 08.08.2025; Ogledov: 540; Prenosov: 7
.pdf Celotno besedilo (342,87 KB)
Gradivo ima več datotek! Več...

6.
Bridging the question–answer gap in retrieval-augmented generation : hypothetical prompt embeddings
Domen Vake, Jernej Vičič, Aleksandar Tošić, 2025, izvirni znanstveni članek

Opis: Retrieval-Augmented Generation (RAG) systems synergize retrieval mechanisms with generative language models to enhance the accuracy and relevance of responses. However, bridging the style gap between user queries and relevant information in document text remains a persistent challenge in retrieval-augmented systems, often addressed by runtime solutions (e.g., Hypothetical Document Embeddings (HyDE)) that attempt to improve alignment but introduce extra computational overhead at query time. To address these challenges, we propose Hypothetical Prompt Embeddings (HyPE), a framework that shifts the generation of hypothetical content from query time to the indexing phase. By precomputing multiple hypothetical prompts for each data chunk and embedding the chunk in place of the prompt, HyPE transforms retrieval into a question-question matching task, bypassing the need for runtime synthetic answer generation. This approach does not introduce latency but also strengthens the alignment between queries and relevant context. Our experimental results on six common datasets show that HyPE can improve retrieval context precision by up to 42 percentage points and claim recall by up to 45 percentage points, compared to standard approaches, while remaining compatible with re-ranking, multi-vector retrieval, query decomposition, and other RAG advancements.
Ključne besede: LLM, hypothetical prompt embedding, Retrieval-Augmented Generation (RAG)
Objavljeno v RUP: 04.08.2025; Ogledov: 574; Prenosov: 4
.pdf Celotno besedilo (1,41 MB)
Gradivo ima več datotek! Več...

7.
Application of Benford’s law on environmental data : master’s thesis
Bogdan Šinik, 2025, magistrsko delo

Ključne besede: anomaly detection, Benford’s law, data Integrity, life-cycle assessment
Objavljeno v RUP: 06.07.2025; Ogledov: 884; Prenosov: 18
.pdf Celotno besedilo (487,80 KB)

8.
Dataset of Uzbek verbs with formation and suffixes
Maksud Sharipov, Jernej Vičič, 2025, drugi znanstveni članki

Opis: The main goal of this work is to create a dataset of Uzbek language verbs. This dataset stores information about which words verbs are derived from and with which affixes. The affixes are classified into distinct categories. With the help of this dataset, it is possible to determine from which parts of speech each Uzbek verb is derived and with which affixes. It also plays a key role in identifying verbs in Uzbek language texts and developing rule-based models for their analysis. Additionally, this dataset plays a key role in building various artificial intelligence models for the morphological and syntactic analysis of Uzbek language texts. Verbs play a crucial role in learning any language; therefore, students in schools and higher education institutions can also use this dataset during the learning process. The obtained dataset serves as a valuable resource for researchers and practitioners interested in Uzbek language processing tasks.
Ključne besede: verb phrase, Uzbek language, Uzbek web corpus, verb form, verb affixes
Objavljeno v RUP: 02.06.2025; Ogledov: 791; Prenosov: 13
.pdf Celotno besedilo (281,07 KB)
Gradivo ima več datotek! Več...

9.
Occupancy estimation using indoor air quality data : opportunities and privacy implications
Domen Vake, Niki Hrovatin, Jernej Vičič, Aleksandar Tošić, 2025, izvirni znanstveni članek

Opis: Indoor Air Quality (IAQ) has long been a significant concern due to its health-related risks and potential benefits. Readily available air quality sensors are now affordable and have been installed in many buildings with public buildings taking center stage. The dynamics of IAQ are commonly studied in relation to different materials used in construction, building design, room utility and effects on occupants. However, besides what the sensors were designed to measure, it is possible to infer other information. In this paper, we present a Machine Learning (ML) model that predicts the presence of people in the room with an accuracy as high as 93 % and the exact number of occupants with 2.17 MAE. We validate our proposed approach in the use-case of an elementary school in Slovenia. In collaboration with the elementary school in Ajdovščina, 8 air quality sensors were placed in classrooms and air quality parameters (VOC, CO, Temperature, and Humidity) were monitored for 6 months. During the monitoring period, school staff collected anonymous data about classroom occupancy. The indoor air quality data was paired with external weather data as well as occupancy to train the model. Moreover, we compare our approach with other commonly used ML approaches and provide results related to our use case. Finally, these results highlight the privacy concerns related to structural monitoring due to the established ability to infer potentially sensitive information.
Ključne besede: indoor air quality, occupancy estimation, machine learning, sensor networks, privacy, building monitoring
Objavljeno v RUP: 02.06.2025; Ogledov: 1059; Prenosov: 7
.pdf Celotno besedilo (3,66 MB)
Gradivo ima več datotek! Več...

10.
Iskanje izvedeno v 0.03 sek.
Na vrh
Logotipi partnerjev Univerza v Mariboru Univerza v Ljubljani Univerza na Primorskem Univerza v Novi Gorici