Live datastore transformation for optimizing big data applications in cloud environments

Live datastore transformation for optimizing big data applications in cloud environments Vendor lock-in is one of the major issues preventing companies from moving their big data applications to the cloud or changing between cloud providers. A choice in provider based on used datastores can be advantageous at first, but with ever-changing applications the chosen datastore may no longer be optimal after some time. Namely, applications’ requirements change due to frequent updates and feature requests, and scalability issues arise as user numbers continuously evolve. In this paper we propose a framework for the live transformation of the schema and data of datastores. Using a canonical data model the framework can be easily extended for additional datastores. The framework performs the transformation on two different levels. It uses a batch layer to transform a snapshot of the datastore, while a speed layer transforms queries inserting new or updated data into the datastore. A transformation is given between MySQL and Cassandra as a proof-of-concept. We show the correctness of the transformation and provide performance results, in terms of transformation times and overhead.