Please use this identifier to cite or link to this item: https://gnanaganga.inflibnet.ac.in:8443/jspui/handle/123456789/16900
Title: Word Sense Disambiguation Corpus for Kashmiri
Authors: Mir, Tawseef Ahmad
Lawaye, Aadil Ahmad
Keywords: Information Extraction
Machine Learning
Sense Annotation
Word Sense Disambiguation
Issue Date: 2024
Publisher: Natural Language Processing
Cambridge Univ Press
Abstract: Ambiguity is considered an indispensable attribute of all natural languages. The process of associating the precise interpretation to an ambiguous word taking into consideration the context in which it occurs is known as word sense disambiguation (WSD). Supervised approaches to WSD are showing better performance in contrast to their counterparts. These approaches, however, require sense annotated corpus to carry out the disambiguation process. This paper presents the first-ever standard WSD dataset for the Kashmiri language. The raw corpus used to develop the sense annotated dataset is collected from different resources and contains about 1 M tokens. The sense-annotated corpus is then created using this raw corpus for 124 commonly used ambiguous Kashmiri words. Kashmiri WordNet, an important lexical resource for the Kashmiri language, is used for obtaining the senses used in the annotation process. The developed sense-tagged corpus is multifarious in nature and has 19,854 sentences. Based on this annotated corpus, the Lexical Sample WSD task for Kashmiri is carried out using different machine-learning algorithms (J48, IBk, Naive Bayes, Dl4jMlpClassifier, SVM). To train these models for the WSD task, bag-of-words (BoW) and word embeddings obtained using the Word2Vec model are used. We used different standard measures, viz. accuracy, precision, recall, and F1-measure, to calculate the performance of these algorithms. Different machine learning algorithms reported different values for these measures on using different features. In the case of BoW model, SVM reported better results than other algorithms used, whereas Dl4jMlpClassifier performed better with word embeddings.
URI: https://doi.org/10.1017/nlp.2024.31
https://gnanaganga.inflibnet.ac.in:8443/jspui/handle/123456789/16900
ISSN: 2977-0424
Appears in Collections:Journal Articles

Files in This Item:
File SizeFormat 
word-sense-disambiguation-corpus-for-kashmiri.pdf10.68 MBAdobe PDFView/Open


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.