Personalized trip planning by integrating multimodal user-generated content

Personalized trip planning by integrating multimodal user-generated content We address the problem of record linkage and semantic integration in the context of large collections of user-generated content. These datasets are often large since it contains the contributions of millions of Internet users. We present an approach based on approximate string matching between the metadata associated with such data. The discovered linkages are stored in an ontology for answering queries on the integrated data sources. We demonstrate this approach in Photo Odyssey, an interactive webapplication which integrates multimodal content from image hosting and travel websites to create a user interface with a graphical trip plan and personalization options.We discuss several practical challenges faced in building such an application Рintegrating and mining large-scale multimodal user-generated data, resolving semantic heterogeneity, and machine learning for matching and ranking items. Photo Odyssey operates in an online manner without using any previously stored knowledge base. We also describe methods to compute relevance of images, remove bad data instances and duplicates, perform contextual filtering, and assign a category to uncatalogued images which enable an interactive application even on Big Data with real-world characteristics.