Facebook said today it has developed a new computing system aimed at artificial intelligence research that is twice as fast and twice as efficient as anything available before.
These days, machine learning and artificial intelligence are, hand in hand, becoming the lifeblood of broad new applications throughout the business and research communities. But even as that dynamic has been significantly driven by computers that are more powerful and more efficient, industry is reaching the limits of what those computers can do.
Increasingly, Facebook is developing elements of its business centered on artificial intelligence, and the social networking giant’s ability to build and train advanced AI models has been tied to the power of the hardware it uses.
Until now, the company has been relying on off-the-shelf computers, but as its AI and machine-learning needs have progressed, it has determined that it can no longer depend on others’ hardware. That’s why Facebook designed the “next-generation” computing hardware it has code-named “Big Sur.”
The new Open-Rack-compatible system, designed over 18 months in conjunction with partners like Quanta and processing manufacturers like Nvidia, features eight graphics processing units (GPUs) of up to 300 watts apiece. Facebook says that the new system is twice as fast as its previous hardware, giving it a 100% boost in the efficiency of neural networks training, and the ability to explore neural networks that are two times as large as before.
“Distributing training across eight GPUs,” Facebook AI Research (FAIR) engineering director Serkan Piantino wrote in a blog post, “allows us to scale the size and speed of our networks by another factor of two.”
Piantino added in the post that Facebook plans on open-sourcing the Big Sur hardware and will submit the design materials to the Open Compute Project (OCP) in a bid to “make it a lot easier for AI researchers to share techniques and technologies” and for others to help improve the OCP.
“The bigger you make those neural networks, the better they work, and the better you train them,” said Yann LeCun, the director of FAIR, during a press conference call earlier this month. “Very quickly, we hit the limits of limited memory.”
On the call, Piantino said that one of the major design specifications for Big Sur machines was that they be “power efficient, both in terms of the power they use, and the power needed to cool them in our data centers.”
For Facebook, it is crucial to have these better, more powerful, and efficient computers. That’s because Piantino told reporters, “our capabilities keep growing, and with each new capability, whether it’s computer vision, or speech, our models get more expensive to run, incrementally, each time.”
Also, he said, as the FAIR group has moved from research to capability, it has seen product groups from across Facebook reach out about collaborations.
As LeCun put it, “Part of our job at [FAIR] is to put ourselves out of a job.”