VUSuperior Chat Room

Saturday, 2 May 2015

Urdu Stemmer CS619 Final Project Spring 2015

Urdu Stemmer


Stemming is the term used in linguistic morphology and information retrieval to describe the process for reducing inflected (or sometimes derived) words to their word stem or root form i.e. generally a written word form.

(Wikipedia)



The goal of both stemming is to reduce inflectional forms and sometimes derivationally related forms of a word to a common base form. For instance:

am, are, is à be

car, cars, car's, cars' àcar

(Stanford)



This application will apply stemming on Urdu words. Given an input comprising one or more words separated by space character, it will reduce them to their base form.



Input
Output
باتوںباتیں
بات
لاحاصل
حاصل
لازوال
زوال


The complete method and steps to implement such a stemmer are given in following research paper. So you have to first read and understand it completely. Then implement it the same way. Your final deliverable marks will depend on how well you have implemented the idea from the research paper, in your application:



http://www.aclweb.org/anthology/W09-34#page=50



You need to implement everything, so no readymade or built in solutions for various aspects of application, will be acceptable.



Don’t forget to put reference of this paper at appropriate place inside your final report.



Tools:Java, Microsoft.Net, Python, or any other modern programming language. SQL Server, MS Access, MySQL, Oracle or any DBMS tool.



Supervisor Name:Usman Waheed

Email ID: usman.waheed@vu.edu.pk

0 comments:

Post a Comment