Please use this identifier to cite or link to this item: https://gnanaganga.inflibnet.ac.in:8443/jspui/handle/123456789/16900
Full metadata record
DC FieldValueLanguage
dc.contributor.authorMir, Tawseef Ahmad-
dc.contributor.authorLawaye, Aadil Ahmad-
dc.date.accessioned2024-12-12T09:38:21Z-
dc.date.available2024-12-12T09:38:21Z-
dc.date.issued2024-
dc.identifier.issn2977-0424-
dc.identifier.urihttps://doi.org/10.1017/nlp.2024.31-
dc.identifier.urihttps://gnanaganga.inflibnet.ac.in:8443/jspui/handle/123456789/16900-
dc.description.abstractAmbiguity is considered an indispensable attribute of all natural languages. The process of associating the precise interpretation to an ambiguous word taking into consideration the context in which it occurs is known as word sense disambiguation (WSD). Supervised approaches to WSD are showing better performance in contrast to their counterparts. These approaches, however, require sense annotated corpus to carry out the disambiguation process. This paper presents the first-ever standard WSD dataset for the Kashmiri language. The raw corpus used to develop the sense annotated dataset is collected from different resources and contains about 1 M tokens. The sense-annotated corpus is then created using this raw corpus for 124 commonly used ambiguous Kashmiri words. Kashmiri WordNet, an important lexical resource for the Kashmiri language, is used for obtaining the senses used in the annotation process. The developed sense-tagged corpus is multifarious in nature and has 19,854 sentences. Based on this annotated corpus, the Lexical Sample WSD task for Kashmiri is carried out using different machine-learning algorithms (J48, IBk, Naive Bayes, Dl4jMlpClassifier, SVM). To train these models for the WSD task, bag-of-words (BoW) and word embeddings obtained using the Word2Vec model are used. We used different standard measures, viz. accuracy, precision, recall, and F1-measure, to calculate the performance of these algorithms. Different machine learning algorithms reported different values for these measures on using different features. In the case of BoW model, SVM reported better results than other algorithms used, whereas Dl4jMlpClassifier performed better with word embeddings.en_US
dc.language.isoenen_US
dc.publisherNatural Language Processingen_US
dc.publisherCambridge Univ Pressen_US
dc.subjectInformation Extractionen_US
dc.subjectMachine Learningen_US
dc.subjectSense Annotationen_US
dc.subjectWord Sense Disambiguationen_US
dc.titleWord Sense Disambiguation Corpus for Kashmirien_US
dc.typeArticleen_US
Appears in Collections:Journal Articles

Files in This Item:
File SizeFormat 
word-sense-disambiguation-corpus-for-kashmiri.pdf10.68 MBAdobe PDFView/Open


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.