PABIRS: A data access middleware for distributed file systems

PABIRS: A data access middleware for distributed file systems Various big data management systems have emerged to handle different types of applications, which cast very different demands on storage, indexing and retrieval of large amount of data on distributed filesystem. Such diversity on demands has raised huge challenges to the design of new generation of data access service for big data. In this paper, we present PABIRS, a unified data access middleware to support mixed workloads. PABIRS encapsulates the underlying distributed file system (DFS) and provides a unified access interface to systems such as MapReduce and key-value stores. PABIRS achieves dramatic improvement on efficiency by employing a novel hybrid indexing scheme. Based on the data distribution, the indexing scheme adaptively builds bitmap index and Log Structured Merge Tree (LSM) index. Moreover, PABIRS distributes the computation to multiple index nodes and utilizes a Pregel-based algorithm to facilitate parallel data search and retrieval. We empirically evaluate PABIRS against other existing distributed data processing systems and verify the huge advantages of PABIRS on shorter response time, higher throughput and better scalability, over big data with real-life phone logs and TPC-H benchmark.