Distributed Integrated Development Environment for CO2-Reduction in Software Development
Project Idea Metadata
- Project Idea Name: Distributed Integrated Development Environment for CO2-Reduction in Software Development
- Date: 10/5/2021 2:52:35 PM
- Administrators:
Project Idea Description
1 Project Description
1.1 Challenge
An increasing number of people develop software. Estimations for 2019 range from 18.9 Mio. to 23.9 Mio. world-wide software developers. All these developers require computing resources to build the software. In general, local computing devices (desktop or laptop) provide these computing resources. Developers require powerful devices to increase productivity since powerful devices reduce the completion time for computationally intensive tasks such as program analyses, compilation, and testing. Manufacturing and operating these computing devices generates CO2 emissions. Due to the large number of software developers, it makes sense to reduce these emissions.
The goal of this project lies in reducing CO2 footprint of the development process by creating a distributed integrated development environment (dist. IDE). Such an IDE allows to:
- Perform the computationally expensive tasks in regions with a low gird carbon intensity.
- Reduce CO2 emissions of the manufacturing process by prolonging usage of devices.
- Use low-power devices such as tablet-class devices for development work.
1.2 Potential customers
Our proposed solution provides multiple benefits to potential customers:
Incremental reduction of local energy consumption and reduction of hardware investments: We propose a component-based approach of a distributed IDE that exists outside of closed environments like AWS, Azure, or Google Cloud, and can be integrated component-wise into existing development system landscapes.
Better developer experience: Performing compilation on a very fast device can reduce the end-to-end compilation time, even when this compilation is performed at a remote location.
Better developer collaboration: One of the distributed IDE components realizes the storage of the working copy (e.g., storing currently modified source files). Sharing of working copies may provide opportunities for better collaboration (e.g., remote pair programming or ad-hoc reviews).
Becoming a service provider: Companies can provide components of the distributed IDE as a service themselves and thus open up another source of income.
1.3 Solution Description
The approach proposed to reach the goal of this project lies in decomposing an IDE into multiple stand-alone components that run on distributed systems. The IDE is distributed into the following five components:
- Build server. A build server is often a part of a continuous integration pipeline. For usage as IDE component, it needs a proper interface.
- Execute and debug. A component that executes the program under development.
- Working Copy. This is a stateful service that stores and provides access to changes to specific streams of a repository. It may also serve as refactoring or lookup service.
- User interface. Either a native application or a web application that provides a user interface for code editing and interfacing with the other components.
The proposed modularization has the advantage that it allows to distribute expensive workloads that tolerate a higher latency (i.e., not the UI) to remote systems located in areas with low grid carbon emissions.
1.4 Energy Savings and Reduction of CO2 Emissions
The goal of the project lies in reducing CO2 emissions by transferring computationally expensive tasks from the client-side (the programmer’s computer) to the server-side (a data center). To estimate the reduction, we look at three different parts of the calculation:
- The reduction on the client-side,
- The increase on the server-side,
- and the communication costs between the client- and the server-side.
We provide simplified estimations to show the potential savings.
1.4.1 Client-side Energy Reduction
When looking at client-side reductions, we focus on laptop computers as these are currently the prevalent class of computers. The study of Denga et al. (see attached PDF for references) estimated that 70% of the energy used by a laptop is used during its manufacture phase, with only 30% used during its entire operational phase, which they estimated being 2.9 years. The study came up with the following numbers:
- Estimated production costs of a laptop computer: 3009 MJ to 4339 MJ (CO2: 227 kg to 270 kg).
- Operation costs of a laptop computer based on a 2.9-year lifespan: 1781 MJ (CO2: 159 kg).
In contrast to other electric devices such as refrigerators, the manufacture phase of a computer is much more energy intensive than the operation phase. Thus, one cannot easily save energy by buying a new device that consumes less energy. This fact is also shown by the study by Hampus et al. that shows that using second-hand laptop computers reduces the carbon footprint as well.
1.4.2 Server-side Energy Costs
Additional costs may incur on the server-side. For simplicity, we assume that the costs of a computation are the same, whether this computation is executed on the programmer’s computer or in the data center. In addition to the computation, a data center requires a cooling infrastructure. Infrastructure energy costs can be as large as the computation costs themselves, e.g., in the United States Data Center Energy Usage Report, the distribution of the energy for 2014 is:
- Infrastructure: 44% (including cooling)
- Network: 2%
- Storage: 5%
- Servers: 48% (operation of the computers)
Based on this report, we approximate the additional costs of the infrastructure based on the computation costs.
To make a quick estimation of these costs, we calculate a scenario, where a software developer works for 240 days a year and each day executes M full builds with variable duration and N incremental builds with a duration of fixed 5 seconds. We assume the machine executing the build draws constant 200W during any build. Even with M=3 full builds (duration of 270s) and N=40 incremental builds per day, the annual server-side energy costs (36MJ) do no exceed the annual savings of using a laptop computer for 6 instead of 3 years (annual savings: 611 MJ).
1.4.3. Communication costs
By distributing an application to multiple systems, we increase the communication costs due to network communication. To avoid additional energy costs due to this communication, we propose to use incremental build steps and caches. The system can rely on the fact that changes to the program code in each edit-compile-run cycle are typically small, and that a previous state can be cached based on the version control revision id or build id and updated with the new changes.
5 Current Status and Related Work
This proposal connects the following three research areas that currently show high activity:
Online Integrated Development Environments provide an entire IDE as web application. There are numerous examples of such online IDEs, popular are AWS Could9, Codeanywhere, Repl.it, Koding, and many more. The main issue with such online IDEs is that they are not modular, i.e., a company must switch to an online IDE in its entirety. For established companies, such a change is a big effort both for the developers and the systems. Some providers of online IDEs already recognized this, e.g., Koding allows mounting of projects managed under their service under a OSX or Linux filesystem. Such ad-hoc measures allow a developer to access their source files within their favorite IDE, but basically reduces the remote services to those of a remote filesystem with the caveat that IDEs does not always support remote filesystems (e.g., IntelliJ). In this project, we propose a more modular approach: companies can exchange parts of the currently used IDE with distributed components, but do not need to change the entire IDE at once.
An additional problem of an online IDE is the potential vendor lock-in as many of the online IDE do not provide the possibility of an on-premises installation (an exception here is Koding). We propose a more incremental approach, by dividing an IDE into different components that can be hosted independently on-premises or be combined with components from other providers through a standardized interface (second source).
Incremental build. Most of the development work is spent in short edit-compile-run cycles. In these short cycles, typically a developer performs only few changes. Many tools support incremental compilation, either directly in the compiler (e.g., Rust or Eclipse), or via a build tool (e.g., gnu make or more fine-grained in Gradle). Building upon these incremental compilation and build features, the build component of the proposed distributed IDE must provide guidelines to implement incremental packaging and artifacts as well, ideally complemented by incremental analysis steps (e.g., linting using PMD).
Continuous Integration and Build Servers. The proposed build component is to some degree already integrated in continuous integration systems, e.g., Gitlab, Jenkins). Such build servers are typically systems that setup a container on a remote machine and execute the build pipeline specified by the user in that container. This is fine since these build servers are tightly integrated with continuous integration or DevOps systems, but a modular distributed development environment needs the following additional functionality:
- The incremental build functionality as detailed in the above paragraph.
- A stable and concise interface for the main IDE to control the build process.
- A stable and concise interface to transfer the artifact to its repository or directly to the machine that is to run the artifact in an edit-compile-run cycle.
2 Work Packages Overview
The project is divided in four work packages.
WP#1: Project Management: 20 hours
WP#2: Specification: 40 hours
The goal of this work package lies in designing the architecture of the distributed IDE prototype (APIs, communication, and functionality of the components), target IDE:
- Design the initial version of REST-Apis between the four components.
- Fine-grained specification of the three components.
- Evaluate the IDE (Intellij, Eclipse, Netbeans, etc.) to be used in the prototype.
WP#3: Realization: 240 hours
The goal of this work package is to implement the prototype of a distributed IDE. We divide this phase into three sub work packages for each of the three components (working copy, build server, execution component):
- Setup of development environment.
- (WP 3a) Working copy component
- (WP 3b) Build component
- (WP 3c) Basic execution component
- Documentation
WP#4: Evaluation: 60 hours
The goal of this work package lies in designing the architecture of the distributed IDE prototype (APIs, communication, and functionality of the components), target IDE:
- Measure power usage before and after using representative tests.
- Measure communication bandwidth before and after using representative tests.
- Test whether the development experience improves when using old or weak equipment.
A modular integrated development environment to reduce CO2 footprint of the software development process:
- Perform computationally expensive tasks in regions with low gird carbon intensity.
- Reduce CO2 emissions of manufacturing processes by prolonging device usage.
- Use low-power devices (e.g. tablet) for development work.
Benefits of innovation:
- Incremental reduction of local energy consumption and reduction of hardware investments:
- Better developer experience
- Better collaboration
- Become a service provider