The ability of computers to understand and generate language took a huge leap forward in 2017 when researchers at Google developed new natural -anguage AI models called Transformers. Some of the experts who built and trained those seminal models have since branched out on their own by founding the Toronto-based startup Cohere, which today announced a new $40 million Series A funding round.
The technology that undergirds Cohere’s natural-language processing models was originally developed by the Toronto-based Google Brain team. Two of that team’s members, Aidan Gomez and Nick Frosst (along with a third cofounder, Ivan Zhang), started Cohere two years ago to further develop and commercialize the models, which are delivered to customers through an API.
Cohere is backed by neural network pioneer and Turing Award winner Geoffrey Hinton, who led the Toronto Google Brain team, as well as some other big names in the AI world like Stanford computer science professor Fei-Fei Li. “Very large language models are now giving computers a much better understanding of human communication,” Hinton said in a statement to Fast Company. “The team at Cohere is building technology that will make this revolution in natural language understanding much more widely available.”
In the last year, critics of large NLP models, which are trained on huge amounts of text from the web, have raised concerns about the ways that the technology inadvertently picks up biases inherent to the people or viewpoints in this training data. Such critiques gained steam after Google controversially pushed out famed AI researcher Timnit Gebru, in part due to a paper she coauthored analyzing these risks. Cohere CEO Aidan Gomez says his company has developed new tools and invested a lot of time into making sure the Cohere models don’t ingest such bad data.
As the debate continues about the ethics of these models and how they’re built, Cohere is focusing on making NLP models more accessible to companies beyond the tech giants.
Gomez says one of the problems with Transformer models is that they require massive amounts of training data and compute power to run. “No one can get access to these high-quality natural- language models; those who can are the ones with access to a quarter-billion dollar supercomputer, so all the big FAANG (Facebook, Amazon, Apple, Netflix, and Google) companies,” Gomez says. “What we want to do is foot the cost of that supercomputer and give access to all these organizations that otherwise couldn’t build products or features on this technology.”
For example, Cohere is providing the NLP capability for Ada, a recent unicorn company in the chatbot space. Ada has experimented with the Cohere natural-language models to match customer chat requests with available support information. Rather than trying to anticipate all the possible wordings of a request, Cohere’s model tries to understand the intent behind it, Gomez says.
That understanding of language is the result of major advances in natural-language processing over the past five years. It started with the development of the Transformer models at Google Brain. At the architectural level, the models were designed to glean not just the meaning of individual words but also the meanings of words in the context of other words, Gomez tells me.
Building on that architecture, a different team of Google researchers developed a novel way of training the Transformer models called BERT (Bidirectional Encoder Representations from Transformers), which now is used to understand queries in almost all of Google’s search functions. The researchers first trained the Transformer model by allowing it to process massive amounts of text from the web. They then fed in full sentences with some words intentionally left out, then challenged the model to find the best words to fill in the blanks.
“It really learns to understand language because it needs to learn to understand contextually what fits in that sentence—a logical choice of word to insert there,” Gomez says. While looking for the right word, the model must learn the nuances of many potential candidates to arrive at a most likely winner.
By searching for the best candidate words, the model learns a surprising amount about words and the things and ideas they represent.
“The representations [the model’s encoding of the meaning and features of words] that come out the other side are incredibly rich,” Gomez says, “because their whole purpose is to be able to predict those blanks, so they really need to understand what the options are.”
The Cohere models bear some similarities to the GPT-3 NLP model, which shocked some people with its ability to create human-sounding text upon its release last year. That model, developed by San Francisco-based OpenAI, is architecturally very similar to the Cohere models. Both are pre-trained using massive amounts of text from the web, and both are delivered through an API (although OpenAI granted an exclusive license to Microsoft to use and alter the model’s underlying code).
But there are big differences, too, Gomez explains. GPT-3 is a “generative” model designed to create text moving from left to right based on a user-provided prompt—similar to an extremely powerful autocomplete function. But that’s just one of many natural language functions, Gomez says. Cohere, he says, offers a platform containing a “full stack” of NLP functions, including sentiment classification, question answering, and text classification.
As critics have pointed out, one drawback of training models is the risk that they might learn from things they shouldn’t, Gomez says. Along with picking up biases, like GPT-3 has been known to do because it was trained on text from across the internet, models may also absorb misinformation inadvertently included in the training data. A new group at Stanford—overseen by Fei-Fei Li, as it happens—has formed to study the risks associated with “foundation” technologies like BERT and GPT-3.
To address the risks, Cohere’s engineers have implemented quality control tests to look for any issues with the model before release, and the company continues to monitor its models after launch as well. In addition, Gomez says Cohere will publish “data statements,” which will including information about training data, its limitations, and any risks—a concept first popularized by Gebru. Cohere has also established an external Responsibility Council that will help oversee the safe application of the company’s AI. The company declined to share who is part of the council.
Gomez tells me his company will use the fresh cash to grow its 50-employee headcount and expand its NLP platform to serve new industries, such as healthcare and financial services. Index Ventures led the funding round, and Index partner Mike Volpi joins the Cohere board.
This story has been updated with more information about Cohere’s approach to responsible AI.