Morphological Rules Pertaining to Persian Spell Checking
“Analyzing Persian texts as some stemmer algorithms is essential for efficient spell checking because: It provides the level of consistency needed and It may work with a concise lexicon.
In this article the morphological rules pertaining to such algorithms are studied.”
In Persian words are extensively combined with various prefixes and suffixes, to make new words. In this sense, and if we define words digitally as strings of characters surrounded by space, the number of Persian words are enormously larger as compared to Latin languages as English. For example the word كتاب (ketab=book) generates following derivatives:
|ketab_am||my book, I am a book|
|ketab_i||a book, you are a book, related to books|
|ketab_im||we are books|
|ketab_id||you are books|
|ketab_and||they are books|
|ketab_itar||more related to books|
|ketab_itarin||most related to books|
|*the suffixes are presented just as they spelled in Persian.|
As seen in this example 19 different words can be made by the simple root “ketab”.
The term Morphological Rules then refers to such rules in Persian that specify how new words can be made. It should be noted here that, by making words, we do not mean the process of generating totally new words as it is usually meant in Persian literature. Actually no one talks about ‘ketab_ha’ as a new word made from ‘ketab’. This is because our digital definition of word: “a string of letters separated by space”
Thus, here we are confined rather to those simple and certain rules that are thought to be useful in the process of digital proofing.
The term curtain is important because, we are not going to consider about those patterns that are rarely used. We consider those rules that can be applied almost in all cases. Nevertheless, the rules are applicable to words based on their grammatical natures. For example you cannot pluralize a pronoun, or only verbs can be conjugated.
Thus, it should be assumed that the Morphological Rules studied here are supported by some Lexicon in which Morphemes are stored with flags that designate their grammatical nature as pertaining to stated rules. The terms Flag, and Morpheme in this article refers to such Lexicon…
Find the remaining on the following link.