Dish comment summarization based on bilateral topic analysis

Dish comment summarization based on bilateral topic analysis With the prosperity of online services enabled by Web 2.0, huge amount of human generated commentary data are now available on the Internet, covering a wide range of domains on different products. Such comments contain valuable information for other customers, but are usually difficult to utilize due to the lack of common description structure, the complexity of opinion expression and fast growing data volume. Comment-based restaurant summarization is even more challenging than other types of products and services, as users’ comments on restaurants are usually mixed with opinions on different dishes but attached with only one overall evaluation score on the whole experience with the restaurants. It is thus crucial to distinguish well-made dishes from other lousy dishes by mining the comment archive, in order to generate meaningful and useful summaries for other potential customers. This paper presents a novel approach to tackle the problem of restaurant comment summarization, with a core technique on the new bilateral topic analysis model on the commentary text data. In the bilateral topic model, the attributes discussed in the comments on the dishes and the user’s evaluation on the attributes are considered as two independent dimensions in the latent space. Combined with new opinionated word extraction and clustering-based representation selection algorithms, our new analysis technique is effective to generate high-quality summary using representative snippets from the text comments. We evaluate our proposals on two real-world comment archives crawled from the most popular English and Chinese online restaurant review web sites, Yelp and Dianping. The experimental results verify the huge margin of advantage of our proposals on the summarization quality over baseline approaches in the literature.