Multi-way sentiment classification of Arabic reviews

Multi-way sentiment classification of Arabic reviews The evolution of the Web and the appearance of new technologies led to the rise of new ways for the Internet users to express their opinions and feelings regarding different aspects of life. Such expressions are written in an unstructured way using natural languages. They hold a great deal of knowledge about the user’s opinions and reactions on various subjects. As a result, a new field called Sentiment Analysis (SA) has come into existence to address the complicated task of extracting such opinions or sentiments from the massive pool of unstructured text available online. Traditional works on SA consider only two sentiments: positive and negative. Multi-way SA sentiment analysis consider sentiments expressed using a star or ranking system. E.g., in a 5-star ranking system, the user’s opinion ranges from very negative (1 star) to very positive (5 stars). This version of SA is obviously much harder to handle which partly explains the limited number of works on it. Moreover, we focus in this work on the Arabic language, which is largely understudied compared to the English language. In this work, a new and relatively large Arabic dataset is used. The dataset, called the Large Arabic Book Reviews (LABR) dataset, is gathered from an online book reviews website. The objective of this work is to perform baseline experiments on this dataset by applying the Bag-Of-Words words coupled with the most popular classifiers. We also investigate the effect of stemming and balancing the dataset. The obtained accuracies are low confirming the intuition that the multi-way SA problem is very difficult and needs further attention.