“Hey Siri, how do you say ‘I love you’ in German?”
That might seem like an innocuous question to ask your phone or smart speaker. But the algorithms that power the software that can answer a question like this have a dark secret: The process of teaching these models to understand human language has a startlingly large carbon footprint.
According to a new paper that will be presented at the annual Association for Computational Linguistics conference in Florence, Italy, later this summer, training one popular off-the-shelf machine translation algorithm has a similar carbon footprint to five vehicles’ entire lifetimes, including their fuel.
For Emma Strubell, the lead author behind the paper, the most shocking discovery of the research was when she analyzed one of the recent models she designed as part of her PhD work at University of Massachusetts Amherst. While the algorithm’s carbon footprint–78,468 pounds of carbon dioxide–wasn’t quite as big as some of the others she assessed in the paper, it still was similar in size to the carbon dioxide that the average American emits in two years.
Machine learning research isn’t the only ubiquitous technology that has an immense carbon footprint. According to one study, technology is on track to account for 20% of all global emissions by 2040. Another recent study found that the world’s love of watching YouTube videos also has a serious carbon footprint: It’s equivalent to that of a small city like Frankfurt or Glasgow, Scotland, over the course of a year.
Strubell’s work is part of a growing awareness around carbon emissions–and how to curb them–within the design and technology world. Designers and engineers are already reacting by trying to design technology that is greener and more sustainable. Green websites have optimized video and photos to consume less power, and they’re far simpler, which helps the sites load faster (and, coincidentally, helps people find what they’re looking for more quickly). One magazine’s website is entirely solar-powered, and it sometimes goes offline if the weather’s too cloudy.
Yet there are many simple things that big companies can do to reduce their emissions. Photo storage services like Apple’s iCloud and Google Images could offer to delete replicate photos (do you really need the 15 selfies you took when you were bored on the train that one time?). According to the YouTube study, the platform could decrease the carbon footprint of its videos by 3% if it simply turned off the video when it is playing in a background browser tab, since it is likely that the user is only using YouTube to listen to music. That might sound insignificant, but that 3% has the same carbon emissions as 50,000 cars in a year.
That’s just web design. As AI algorithms begin to power more and more of the activities and services we use online, it’s important to also consider the environmental cost of developing them. Why do machine learning algorithms use up so much energy in the first place? In part, Strubell says, it’s because there’s been a trend in the last few years to take advantage of the huge data sets and data processing power available and simply use more data and more computing power to achieve a more accurate result. Previously, this wasn’t feasible because it would take too long to teach the model–instead, researchers would focus on smaller data sets. But recent advances in computing have made it a ready option.
For instance, a Google intern last year created an impressive neural network that could generate incredibly realistic images with lifelike textures. To create this stunning result, researcher Andrew Brock told me he simply threw a lot of computational power at the problem by increasing the complexity of the neural network and then showing the algorithm significantly more images. “The good news is that AI can now give you a more believable image of a plate of spaghetti,” data artist and researcher Jer Thorp wrote on Twitter at the time. He jokingly estimated: “The bad news is that it used roughly enough energy to power Cleveland for the afternoon.” According to Thorp and my joint calculations at the time, the amount of energy Brock used for just one experiment could have powered an average American household for six months. Google says that the company is investing in more efficient algorithms, and in May announced a family of machine learning models that have 10 times greater energy efficiency than previous versions.
There’s a similar trend happening in the natural language processing area that Strubell focuses on, which focuses on helping computers understand human language. Strubell says that researchers have started to first teach their models how language works by feeding the algorithms billions of pages of text taken from the internet, an energy-intensive process. Then, the model is streamlined using data that’s been annotated by humans–which is much higher quality and uses less energy, but is much more expensive. Now, Strubell says, much of the new research on language processing is built on top of models with large carbon footprints.
Helping machines understand language is a crucially important problem, which you could argue makes it worth the carbon cost (it’s more debatable if we really need computers to generate realistic images). But Strubell points out that researchers routinely spend thousands of dollars and thousands of tons of carbon hunting for very small increases in accuracy.
“If all this carbon footprint and money is required to train the model, maybe that would be reasonable if it’s a groundbreaking, innovative model that’s changing the face of research, increasing accuracy, and really solving a problem,” she says. “All people care about is making the [accuracy] number higher at any cost. We shouldn’t be doing that. It’s just not responsible.”
Strubell believes that researchers should include a common measurement that indicates how much computation power they needed to train an algorithm in any papers they write. She also hopes that companies like Google that run popular platforms in which people write machine learning models will start supporting more energy efficient algorithms by giving people an easy way to calculate their computation power. Strubell says that there are ways to streamline algorithms so they don’t need as much training, but people don’t usually spend the time coding that efficiency mechanism themselves. However, Google could integrate it into its platform Tensorflow so that developers could access the more advanced technique using just a few lines of code, making it easier for them to design more energy-efficient algorithms.
Researchers usually aim to make algorithms as efficient as possible: That usually means as fast as possible. But Strubell hopes that her work might convince researchers to consider the energy efficiency of their code. Instead of focusing on making bigger, heavier algorithms with more data, she hopes that the discipline will move more toward more carbon-friendly, advanced machine learning techniques that are just as accurate as the ones that rely on brute power.
Ultimately, Strubell’s work raises the question of what problems are important enough to spend precious energy on.
“You should really only do this scale of research if you think the result is going to be really impactful,” Strubell says. “People should be more considerate of that.”