Entity resolution for high velocity streams using semantic measures

Entity resolution for high velocity streams using semantic measures Now-a-days large amount of data is generated from various stake holders such as data from sensors and satellites regarding environment and climate, social networking sites about messages, tweets, photos, videos and data from telecommunications etc. This big data, if processed in real-time, helps decision makers to make timely decisions when an event occurred. When source data sets are large (velocity, variety, veracity) traditional ETL (Extract, Transform, Load) is time consuming process. This paves path to extend traditional data management techniques for extracting business value from bigdata. This paper extends the hadoop framework for performing entity resolution in two phases. In phase 1 MapReduce generate rules for matching two real world objects with similarities. The more the similarity, the objects are similar. Similarity is calculated using domain dependent and independent Natural language processing measures. In Phase 2 these rules are used for matching stream data. Our proposed approach uses 13 semantic measures for resolving entities in stream data. Stream data such as tweets, messages, e-catalogues are used for testing the proposed system.