Building a Distributed Generic Recommender Using Scalable Data Mining Library

Building a Distributed Generic Recommender Using Scalable Data Mining Library Recommender systems produce list of recommended items through content based or collaborative or hybrid combination of these two approaches. The paper presents a generic approach for performing collaborative filtering using data mining techniques to discover relationships among users and items. Using generic model techniques a single recommender system can produce recommendations about a variety of items. The methodologies reported for development of recommender systems are not efficient for generic application. The difference in the implementations of recommender depends upon how they analyze the big input data to recognize the similarity between users and items that indicates the relevant preferences for that user.

Generic user based recommender works with data model encapsulating recommender input data in Apache Mahout which is extensible data mining library. The recommender system framework can use any similarity metric. We have chosen Pearson correlation because the computation would be fast. One of the parameters of user based recommender is User-Neighborhood. Fixed-size Neighborhood has the advantage that the recommendations are based on fewer similar users. Hadoop software library allows distributed processing of big data across multiple clusters of nodes. The paper describes an implementation using Apache Mahout and Hadoop and also explores feasible augmentation that can enhance efficiency of recommendation. This paper reports successful implementation of generic recommender.