Corpus Based Study on Vocabulary Profile of Shahmukhi Punjabi Language.
Abstract
This research is about the development of the Vocabulary Profile (VP) with the help of compiling a corpus of two million words of Shahmukhi Punjabi. A corpus of Shahmukhi Punjabi was transliterated to Gurmukhi Punjabi for parts of speech (POS) tagging. Corpus was analyzed with the help Antconc. Frequency list and the list of different vocabulary items according to their grammatical categories were studied in the developed corpus. It has been observed that the words of Punjabi language have many different cases and forms as contrary to English language and similar to the Urdu language. Nouns, verbs and adjectives vary according to number and gender. Abbreviations and loan words from English language were also found in the corpus.