Language Technology
Morphological Analyzer
The preprocessing stage of most Natural Language Processing projects includes the analysis of the words and its internal structure. Morphological Analyser is a program that is serves to pass this hurdle. The role of a Morphological Analyser is of extreme relevance for effective handling of activities in the projects like Chatbot, Plagiarism Checker, Dependency Parser etc.
Morphology refers to the identification of a word stem from a full word form. It deals with the study of words and their internal structure. Morphological Analyser is a program for analysing the morphology of an input word, detects morpheme of any text and their analysis. Malayalam is a highly agglutinative language, numerous suffixes can be joined together with the root word to produce a compound word. The challange faced by most of the projects in Malayalam language processing is the task of preprocessing the data. Malayalam language is rich with a wide range of inflections, multiple suffixes etc and this presents a major obstacle in the preproceesing stage and hence there is a vital need for this project as it is mandatory as a supporting tool for almost all the projects.
ICFOSS has developed a web based morphological analyser tool for Malayalam. The proposed system uses recursive suffix stripping methods and moreover, the sandhi rules are also taken into consideration during the stemming process. A rule-dictionary based approach is used, where four modules are incorporated together, in the development of the project. The first three modules ‘stem’, ’split’ and ‘dvithva’ are suffix stripping modules and strips apart suffixes until no suffix is left at the end of the word. Each of the module serves one another while stemming. The system has 4 type of dictionaries, the dictionary with the valid suffixes, one with suffixes whose first letter doubles at the time of joining, the dictionary with the root words and a dictionary of verbs. These three dictionaries are of atmost relevance since the accuracy of the analyser depends on these dictionaries. The final word is looked up in the ‘root word’ dictionary and lastly, the fourth module ‘tag’ assigns tags to the suffixes. The Result produced is the list of Suffixes along with its tags and the root word.
Gitlab Repository
Express Interest