At the end of 2018, at a biology conference and competition known as Critical Assessment of Structure Prediction, Alphabet’s London-based AI subsidiary DeepMind made an incredible showing of its ability to infer the physical structure of proteins based on their genetic code. DeepMind’s AI, AlphaFold, won the competition, making the best predictions in 43 out of 90 tests. The company had never participated in the conference before.
While impressive, the technology wasn’t yet capable of replacing the existing expensive and time-consuming experimental methods for determining what these proteins look like. However, its latest software comes close.
In November, AlphaFold again outperformed all the other competing groups at CASP. The technology solved protein structures other labs had been working on for years. Scientists think the technology could have immense implications for the way proteins are studied.
DeepMind is still in the process of validating its latest technology. But it’s already working with academic and industry partners to understand how the technology can have the most impact. One obvious area could be drug discovery. Understanding the way proteins fold into three-dimensional shapes is critical to understanding how to design drugs. Scientists need to know how various molecules lock onto proteins and change the way they operate in the body in order to propose potential medicines.
Now DeepMind is validating its technology and writing a paper about how it all works. In a conversation with Fast Company, its principal and team lead Pushmeet Kohli explains how the technology has been streamlined to make better predictions about the 3-D structure of the protein and what that means for its future. This Q&A has been edited for brevity and clarity.
Fast Company: How has the AlphaFold technology changed since last year?
Pushmeet Kohli: In the previous work, the neural network was getting the sequence of a protein and predicting which particular amino acids would be close to each other. So it was predicting this histogram or this distance matrix, which was essentially saying that this amino acid would be close to this amino acid and so on. Then a second module was essentially using that information to fita 3-D structure to that information.
In the new system, that two-stage process doesn’t happen anymore. It is one neural network, which just takes the sequence and the alignment and directly makes a prediction about the structure of the protein.
That neural network is modulated with the ability to generalize. In school, we learn the concept of addition. We don’t memorize that three plus four is equal to seven or 23 plus 17 is equal to 40. We understand the concept of what addition is and how you can add any two numbers. That conceptual understanding is what we have tried to bake into the neural network so that it does not memorize that “for this sequence it is this structure.” It tries to understand what concepts are at play, so that it can predict any protein, not just the proteins that were intuited in the training data.
FC: What proteins can it not predict?
PK: It still bases its understanding on evolutionary history. So it basically looks at all the proteins that are known and sees how certain residues interact. It’s using that information to make predictions about the structure.
Pushmeet Kohli, DeepMind
Proteins are not like Lego pieces. They are flexible, so they can move around.”
FC: How can this technology help researchers or industry biologists?
PK: Let’s say you’re trying to understand the mechanisms of disease. Even if you take the example of SARS COV-2 [the virus that causes COVID-19], one of the most important elements was to first sequence the virus and then to understand its 3-D structures, after which we were able to understand how the virus or the spike protein was interfacing with the cells of the human body.
With this technology, we would be able to accelerate that process. Experimentalists would now be able to do a better job and be faster designing this experiment rather than having to wait for a couple of years while somebody came up with the 3-D structure of the proteins.
FC: Speed is a big deal in drug discovery in particular. Clinical trials take a long time and frequently fail. Could this technology help success rates?
PK: Absolutely. Finding the structure of a protein is so time-consuming. Drug designers have to be very careful as to which particular protein they would like to understand and invest that amount of money and effort to be able to find the structure of that group.
Now if this new system allows drug researchers to not only be able to understand the 3D structure for one particular target, but a large number of targets, that then definitely opens up a bigger window for them to understand not just how a specific drug is going to interact with a protein, but how is that going to be tracked with how that drug would interact with many other proteins.
Our hope is that this knowledge will allow experiments to become more effective. But this is something that has to be validated over time. We hope to see how this technology interacts with various use cases like drug development.
FC: What is the next step for you as researchers?
PK: For these past three to four years, our focus has been on this extremely important problem and pushing it forward. We’re very happy that we have been able make this advancement. Of course there are many problems that remain, and that’s what the focus of our research team is: improving the system further, not only in accuracy, but also in coverage. There are many other questions that our system is still is unable to answer.
For example, we are making predictions about a static structure of a protein, but proteins are not like Lego pieces. They are flexible, so they can move around. How do they flex and how do those movements allow them to bind to other proteins? When you are thinking about biologics, where proteins act with other proteins, like insulin, then the interaction becomes more sophisticated and the flexible structure needs to be taken into account. Answering those questions are a key scientific and research challenge that we intend to take up next.
We are also investigating the most impactful way for this technology to interact, not just with commercial, but also in academic partnerships. We received this very encouraging news in the form of the competition results only in the past few weeks. Over the next few weeks our team is making plans about how do we now have partnerships with academic, industrial, and commercial partners. The key thing that we are trying to optimize is the impact that this technology can have in the world.