Development of Diacritical Marks to Punjabi Shahmukhi nouns and verbs.

  •  Muhammad Ahmad Hashmi
  • Muhammad Asim Mahmood
  • Muhammad Ilyas Mahmood. 
Keywords: digitization, diacritics, Natural Language Processing (NLP), Punjabi Shahmukhi, WordNet

Abstract

The study has been designed to apply diacritical marks to 1000 Punjabi words including 800 nouns and 200 verbs. The corpus of 2 million words has been taken from the different books, newspapers, magazines, articles and novels. Punjabi Shahmukhi lacks any online digital resource to develop different tools of Natural Language Processing (NLP), which will help to recognize the international status of it. Punjabi Shahmukhi has “Perso-Arabic” script and has been ignored by linguists to digitize its literature. The study is significant as it will serve its part in the development of WordNet and will help to develop a Part of Speech (POS) tagger of Punjabi Shahmukhi, digitize the literature of Punjabi Shahmukhi and be helpful for the teachers and non-natives to develop an intercultural harmony.

Published
2019-08-01