CHive: Bandwidth Optimized Continuous Querying in Distributed Clouds

CHive: Bandwidth Optimized Continuous Querying in Distributed Clouds Bandwidth efficient execution of online big data analytics in telecommunication networks demands for tailored solutions. Existing streaming analytics systems are designed to operate in large data centers, assuming unlimited bandwidth between data center nodes. Applying these solutions unmodified to distributed telecommunication clouds, overlooks the fact that available bandwidth is a scarce and costly resource making the telecommunication network valuable to end-users. This article presents Continuous Hive (CHive), a streaming analytics platform tailored for distributed telecommunication clouds. The fundamental contribution of CHive is that it optimizes query plans to minimize their overall bandwidth consumption when deployed in a distributed telecommunication cloud. Additionally, these optimized query plans have a high degree of parallelism built-in, benefiting speed of execution. Early experiments on data from a large mobile operator, indicate that CHive can yield bandwidth reductions upwards of 99 percent.