A survey on innovative approach for improvement in efficiency of caching technique for big data application

A survey on innovative approach for improvement in efficiency of caching technique for big data application Big Data is playing important role in scientific, industrial and academic areas. Information is being generated everyday by millions of computing machines and collected for future use. The big data is useful for business need, scientific research, future predictions for community welfare, lifestyle enhancements etc. The data collected at Google is around 50TB, Twitter 20TB everyday which is huge in volume, velocity and variety. There is need to resolve big data issues through high computing machines and large scale nodes with the help of distributed processing technologies and software tools like MapReduce from Google, Hadoop of Apache foundation and its eco system. These technologies and tools needs to be modified so as to make usefulness of intermediate data, improvement in efficiency of output, minimize overhead on processors, efficient storage technologies and improve security.

The survey is discussed to get acquainted and plan for further enhancements that can be possible for the future requirements. As such one of the observations among various issues of enhancements for big data is noted that large amount of intermediate data generated by map and reduce operation is not used when task finish and thrashed away as well as incremental computations are not treated well by the existing cache mechanism. Hence the research will be done to use cache mechanism efficiently to optimize computational time and reduce storage overhead for real time data over the distributed file system (DFS). The survey focuses on big data domain orientation, the technologies applied for execution of big data applications and its eco system, literature survey from various existing practices towards improvements in optimization of computational time and reduction in space of storage system as well as to improve the performance, efficiency, scalability and architecture and proposed new system architecture to achieve above aspects.