Ask the artificial intelligence system created by German startup Aleph Alpha about its “Lieblingssportteam” (favorite sports team) in German, and it riffs about Bayern Munich and former midfielder Toni Kroos. Quiz the neural network on its “equipo deportivo favorito,” and it replies in Spanish about Atlético Madrid and its long-ago European competition success. In English, it’s the San Francisco 49ers.
Answering a question never seen, matching language to culture, and peppering answers with backup facts has until recently been beyond the ken of neural networks, the statistical prediction engines that are a pillar of artificial intelligence (AI). Aleph Alpha’s approach, and others like it, represent a shift in AI from “supervised” systems taught to complete tasks, such as identifying cars and pedestrians or finding disloyal customers through labelled examples. This new breed of “self-supervised learning” networks can find hidden patterns in data without being told in advance what they’re seeking—and apply knowledge from one field to another.
The results can be uncanny. Open AI’s GPT-3 can write lengthy, convincing prose; Israel’s AI21 Labs’ Jurassic-1 Jumbo suggests ideas for blog posts on tourism or electric cars. Facebook uses a language-understanding system to find and filter hate speech. Aleph Alpha is fine-tuning its general AI model with specialized data in fields such as finance, automotive, agriculture, and pharmaceuticals.
“What can you do with these models beyond writing cool text that seems like a human has written it?” says Aleph Alpha CEO and founder Jonas Andrulis. The serial entrepreneur sold a prior company to Apple, stayed three years in R&D management, then built his current venture in Heidelberg. “These models will free us from the burden of banal office work, or government busywork like writing reports that no one reads. It’s like a capable assistant—or an unlimited number of smart interns.”
Self-supervised systems turn traditional software development on its head: Instead of tackling a specific problem in a narrow field, the new AI architects first build their self-learning models, let them ingest content from the internet and private datasets, and then discover what problems to solve. Practical applications are starting to emerge.
For white-collar office workers, for example, Aleph Alpha is teaming up with workflow automation software maker Bardeen to explore how users could enter free-text commands in different languages to generate useful code without knowing how to program.
As a measure of the field’s progress, just two years ago the state-of-the-art neural network—a language-understanding system called BERT—held 345 million parameters. Aleph Alpha, which closed a €23 million ($27 million) funding round in July, is training a 13 billion parameter AI model on Oracle Cloud Infrastructure (OCI), using hundreds of Nvidia’s most powerful graphic processing units connected by high-speed networking. A second Aleph Alpha model holds 200 billion parameters.
Cloud computing, such as OCI, is removing a big development constraint. “Artificial general intelligence is limited by computing power, and it’s limited by training the systems,” says Hendrik Brandis, cofounder and partner at EarlyBird Venture Capital in Munich, which led Aleph Alpha’s latest funding round. “The processing capacity that’s available in the cloud will lead to an AGI solution, and that will happen at some point, though I don’t want to set a time.”
ACCESS AND ETHICS
Along with cloud computing access, self-supervised systems have ridden a tenfold increase in GPU computational capacity over the past four years, the advent of so-called transformer models which take advantage of that parallel processing, and the availability of much more training data online. They’ve also sparked debates about who has access to the models and computing resources that power them and how fairly they behave in the real world.
Interpreting X-rays and ultrasounds quickly in a pandemic, suggesting lab tests, writing legal briefs, and retrieving relevant case law and patents are all potential applications, according to an August report by Stanford University’s Center for Research on Foundation Models, formed this year to study the technological and ethical implications of self-supervised AI systems.
“We’re seeing that a single model can be adapted to a lot of different applications,” says Percy Liang, the center’s director and a computer science professor at Stanford. “But any security problems and biases also get inherited. That’s the double-edged sword.”
Politicians and researchers have been advocating for more open access to foundation models and the algorithms that underlie them. So far, research on building large-scale models has largely been the province of the biggest technology companies: Microsoft and its partner OpenAI, Google, Facebook, and Nvidia. China’s government-sponsored AI academy in Beijing released a gargantuan model with 10 times as many parameters as GPT-3.
“I don’t like this. At some point, certain things need to be in the public sector, or we’ll lose democratic access,” says Kristian Kersting, a computer science professor and head of the AI and ML lab at the Technical University of Darmstadt in Germany. Kersting is teaming up with Aleph Alpha on a doctoral program that combines work and study, in part to help broaden access to these models.
Foundation models can also reproduce biases they find online and have the potential to mass-produce hate speech and disinformation, the Stanford report found. Researchers have shown they can be trained to generate malicious code.
Andrulis is positioning Aleph Alpha, a member of the Oracle for Startups program, as a European innovator that can help ensure the Continent produces its own foundation models that businesses and governments can use. It’s training its system in English, German, Spanish, French, and Italian, and betting it can win contracts as an alternative to foundation models built in the United States and China.
The climate may be right for new approaches. More than half of companies have adopted AI in at least one business function, according to 2,395 global respondents in McKinsey & Company’s The State of AI in 2020 report. In healthcare, pharmaceuticals, and automotive, more than 40% of respondents reported increasing AI investments during the pandemic. But just 16% said they’d taken deep learning—the branch of AI that uses neural networks to make predictions, recognize images and sounds, or answer questions and generate text—beyond the pilot phase.
Today’s technologies, from cloud resources to more sophisticated training techniques, mean the time is ripe to move self-learning AI from experiment to business reality.
“This is a new generation of model, and in order to train those you need a new generation of hardware—the old GPU clusters aren’t sufficient anymore,” says Andrulis. “On the industry side we have raised a lot of capital and partnered with Oracle. We’re building a way to translate an impressive playground task into an enterprise application that creates value.”
Aaron Ricadela is a senior communications director at Oracle. He was previously a journalist at Bloomberg News, BusinessWeek, and InformationWeek, and his work has appeared in The New York Times, Wired, Focus, and the Süeddeutsche Zeitung.