FlexiMod: Flexible Coexistence Support for Programming Models

FlexiMod: Flexible Coexistence Support for Programming Models The rapid growth in big data is driving the development and evolution of numerous analytics frameworks optimized for the different needs of applications. Emerging big data applications comprise rich multi-faceted workflows with both compute-intensive and data-intensive tasks with intricate communication patterns. Thus, a single framework cannot support all application types and needs. For example, while the MapReduce model has proven to be effective for common data-intensive tasks with well-defined execution phases, the MPI programming model may be better suited for extracting high-performance forcompute-intensive tasks and handling arbitrary communication patterns. Researchers have recognized this need to employ specialized models for different phases of a workflow, e.g., performing computations using MPI followed by visualizations using MapReduce. As a result, compromises have to be made either to use multi-cluster approaches that entail large data movement across clusters, or to sacrifice some aspects of the applications, e.g., using MapReduce solely with a higher communication overhead. Consequently, there is a crucial need for supporting coexisting disparate programming models on the same set of resources that are managed in a holistic manner. The objective of this research is to provide an efficient solution for the above problem by designing FLEXIMOD, a holistic approach for supporting coexistence of multiple programming models. The envisioned solution includes a user-friendly workflow generation tool, a runtime environment that feeds the different tasks to different programming frameworks and transparently executes the workflow, and an underlying scheduling system that co-ordinates and co-host different frameworks in the same set of resources under a multi-tenant environment. Our pilot project, GERBIL, a framework for co-hosting unmodified MPI applications alongside MapReduce applications on top of YARN. GERBIL bridges the fundamental mismatch between YARN- and MPI by designing an MPI-aware resource allocation mechanism. Our initial evaluation shows that GERBIL enables MPI executions with performance comparable to a native MPI setup, and improve compute-intensive application performance by up to 133% when compared to corresponding MapReduce version of the applications.