Improving the integration process of large software systems

Improving the integration process of large software systems Software integration is the software engineering activity where code changes of different developers are combined into a consistent whole. While the advent of distributed version control systems has allowed distributed development to scale up substantially, at the same time the risk of integration conflicts (changes that do not go well together) and the time required to fix them has increased as well. In order to help practitioners deal with this paradox, this thesis aims to understand and improve the integration process of modern software organizations. We took the Linux kernel as our pilot case study, which is supported by a distributed version control system (Git) and low-tech reviewing system (mailing list). So far, we have (1) analyzed how to reconstruct the data of the integration process in a low-tech environment where reviews are stored in emails without explicit link to version control commits, and (2) studied the characteristics of the Linux integration process. We found that the commits developed by more mature developers and impacting less subsystems are more likely to be accepted. As a next step, we plan to build a model quantifying the integration effort.