Skip to content
Bild einer Windmühle, das für Nachhaltigkeit steht

Sustainable AI – How environmentally friendly is AI really?

29 June 2022

Sustainable AI is becoming increasingly important. But how sustainable are AI models really? Big tech and smaller applications differ greatly in this respect. We have looked at how sustainable small-scale AI really is, what open questions remain and what recommendations can be made.

Sustainable AI and AI for Sustainability

For quite some time now, one can read about the potentials of AI for the fight against climate change. For example, Rolnick et al. (2023) conducted a survey on many (potential) use cases, such as Enabling low-carbon electricity or Managing forests. However, there are also concerns about the sustainability of AI. Often, scientists make a difference between sustainable AI and AI for sustainability. While the latter serves purposes like increasing the efficiency of renewable energy, the former is about making AI itself sustainable; because what often remains unsaid: many AI models consume massive amounts of energy.

The bigger the better

In a widely cited study, Emma Strubell and colleagues conducted experiments with natural language models which found that some of these models emit as much CO2 as five cars in their entire lifetime.  In another famous study (Fig 1), researchers at Open AI looked at the growth of AI models in the recent year and found that they double their size every 3.4 months, thus contributing to a striking increase in energy consumption.

Fig 1 The increasing size of AI models during the past years. Petaflop/s-days roughly denotes the size of a model. And with increasing size, the energy consumption increases, too. The authors from Open AI computed the size based on the original research papers of the respective models. For this post, we took a closer look at the authors of those papers and added their institutional affiliation to the plot. (Google, i.e. Google Brain and Deepmind are subsumed as Alphabet)

Based on these results, it is often claimed that the current trend to bigger and bigger models is far from sustainable. Without denying this, a closer look at the numbers is startling: It is clear / the data shows  that the vast majority of the models in Fig 1 are built by Big Tech companies like Alphabet, Microsoft, or Baidu. As a matter of fact, it is no secret that AI research is not driven by “normal” universities anymore. The reason for this is of financial nature: training such models is extremely costly. For example, Strubell et al (2019) report that the costs for cloud computing can be around $100,000.

Sustainability on Huggingface

Unfortunately, there are no reliable numbers on the energy consumption of smaller AI-projects. The only exception is an experiment conducted by Marcus Voß from Birds on Mars, who did this for the study “Nachhaltigkeitskriterien für künstliche Intelligenz”. We were able to reproduce and update the results (Fig 2). The datasource is the (self-reported) emissions by models found on Huggingface, a  hosting platform for AI models. These models can be downloaded and fine-tuned to one’s own purposes, and  since the models are freely available, they give some insights into smaller AI projects.

Fig 2 CO2-emissions of Huggingface models for different tasks such as automatic translation or text summarisation. The numbers come from so-called model cards. In recent times, these cards are not only used for documenting accuracy-metrics but also for documenting what the model emitted during training. The data on transportation can be found here and for streaming here. We want to thank Marcus Voß for his support.

The experiment shows that the emissions for training a model are not always excessively high. Streaming in 4k, as an example, has a stronger impact than most of the above models. One might say that if you pass on the newest episode of Bridgerton today, you can train your Huggingface model with a clear conscience tomorrow.

However, the question is: who trained the models and where were they trained? Unfortunately, it is difficult to find out who exactly contributes to Huggingface. But this is crucial, since for our purposes only smaller projects matter. Finding out about the geographical location of the training is almost impossible, too. And depending on it, the emission can vary significantly because renewable energy is much more ecofriendly than fossil fuels. This makes it difficult to compare the model emissions directly.

How sustainable is Public Interest AI?

For the research project Public Interest AI, we develop AI prototypes that are designed  to serve the public interest. For this post, we measured the electricity consumption for training these models. (Note that this is rather of anecdotal nature than being representative). Nevertheless, it should give some impressions about what dimension of CO2-emissions can be expected of small or medium sized machine learning projects.

The first prototype is intended  to map (non-) accessible places. For this we deploy object detection, which automatically recognizes objects like stairs, steps, ramps, and stair-rails (Fig 3). For this task we previously annotated a dataset and chose YOLOv5 for the computer vision task. YOLOv5 is a widely used state-of-the-art deep neural network, and the technique for using it is called Transfer Learning: A pre-trained model is fine-tuned to our accessibility-dataset.

Fig 3 Object detection for mapping accessible places. We applied the trained YOLOv5s to a short video. 

The second prototype is going to support fact-checkers in their work against disinformation. Before checking potential disinformation, one has to find a claim to check. The purpose of this natural language model is to spot such claims automatically to lower the workload of human fact-checkers.

For this task, we tried different machine learning models. First, we used “oldschool” models such as logistic regression and support vector machines, but  we also used state-of-the-art models like an ensemble of transformers and a triplet network. These are instances of transfer learning, too.

Fig 4 Electricity consumption for training several machine learning models on different datasets. For logistic regression, support vector machine, and triplet network,  we used grid search for hyperparameter tuning. For YOLO we used the already implemented hyperparameter evolution. We took the documented hyperparameters for the ensemble model. The language models were trained on four different datasets: Checkthat (ca. 47.000 sentences), Claimbuster (ca. 24.000 sentences), Germeval (ca. 4.200 sentences) und Debatenet (ca. 3.700 sentences). The computer vision model was trained on a dataset containing about 6.500 images.

Fig 4 visualises the electricity consumption of the models. Two observations are central: The ensemble model has by far the highest electricity consumption. This is no surprise, since it is built from 60 individual transformers. On the other hand, it is striking that the electricity consumption for training these models is still only a little higher than Streaming one hour in 4k quality.

Where to train environmentally friendly?

As mentioned above, depending on the geographical location the same electricity consumption can emit different amounts of CO2. This is because of the given energy mix, and because of the fact that  renewable energy emits less than fossil fuels. Fig 5 visualises the emissions of the ensemble model depending on the geographical location. Even though we trained in Germany, we can estimate the amount of CO2 that would have been emitted if we had trained in other countries.