By PAUL VON BÜNAU
DeepMind AI’s impressive protein structure prediction accuracy hints at a huge future role in biomedical research – but turning potential into performance will partly depend upon our faith in AI’s conclusions
DeepMind’s success in creating an Artificial Intelligence (AI) tool for protein structure prediction – AlphaFold – spawned a flood of exhilarated headlines. The DeepMind team claimed to have found “a solution to a 50-year-old grand challenge in biology”.1 Nature journal proclaimed: “It will change everything”2. Behind the headlines, however, experts engaged in an instant and heated debate on the question of whether AlphaFold had indeed “solved” the protein folding problem.
Why such excitement?
At the 14th biennial Critical Assessment of protein Structure Prediction (CASP) contest, AlphaFold, a neural network trained on publicly available protein sequence and structural data, performed astonishingly well. For the 100 proteins included in the CASP14 challenge, the average overlap between AlphaFold’s predictions and experimentally derived structures was 90 on a 100-point scale. Many scientists who had worked in the field for decades were baffled by its accuracy.
Following AlphaGo’s ingenious moves in its victory against human Go champions – interpreted, by some, as showing AI “creativity” for the first time3 – AlphaFold’s performance signals DeepMind’s second AI aha-moment. With potential applications in biology, bioengineering and drug discovery1,2, the ability to predict protein structure from sequence is sure to have more real-world applications than acing Go.
However, for all the hype surrounding AlphaFold, the gap between protein structure prediction in a competition and having a useful tool for biomedical research might turn out to be a significant one. Bridging it will require a new level of communication between scientific disciplines.
In structure lies understanding
Proteins are some of the most versatile players in biology’s toolbox of functional units, with essential roles in cellular infrastructure, energy production and signalling functions.
A protein’s structure determines its properties – for example its rigidity or flexibility, its motility and its ability to interact with other molecules – and these shape a protein’s unique role within the cellular machinery. Structural knowledge is a powerful tool for advancing our knowledge of basic biology, understanding evolution of different species, and tackling human diseases.
Proteins fold within (milli)seconds and largely unaided, which suggests that a protein’s unique 3D information must be encoded in its 1D structure, the sequence of its basic building blocks – amino acids4. A typical protein contains between 50 and 2000 of these building blocks, with a choice of 20 canonical amino acids. As a result, even for a small protein, the combinatory space of folding possibilities explodes; “bottom-up” computation, based on the laws of physics, becomes intractable.
While this complexity has spurred robust computational prediction in the past, concerted efforts such as CASP and FoldIT have improved overall performance. At CASP14, AlphaFold surpassed the field.
Has the protein folding problem been solved?
Does AlphaFold’s accuracy in predicting protein structures at CASP14 mean the protein folding problem is solved? How might we define what “solved” even means?
On a practical level, 100% accuracy in structure prediction from all possible sequences would be a definite “yes”. AlphaFold still falls short of this level of perfection, reaching its impressive accuracy for only about two thirds of the proteins tested.
On a conceptual level, we could consider understanding the forces that compel a protein to assume its 3D shape to be a “solution”. Some have argued that AlphaFold’s Blackbox nature per se precludes deriving a true solution – at least one that is comprehensible to humans.
Further complication arises from the fact that not all protein structures can be found experimentally. Even if we had a (near)-perfect AI prediction tool, we currently lack the means to fully validate its accuracy. Without complete understanding of the underlying principles, we won’t know whether the AI errs or whether the experimental structure is wrong. Indeed, some of AlphaFold’s predictions may only be “inaccurate” because the correspondent experimental structures might include misinterpreted data.
While 100% accuracy may not be a realistic goal, practical milestones pave the way towards perfection. AlphaFold’s performance certainly constitutes such a milestone. Just how useful it will be is dependent upon how it is applied in the realms of biomedical research, bioengineering and drug discovery.
Employing AI in biomedical research and drug discovery
Experimental techniques have so far helped us derive the structures of some 170,000 (of the 2m known) proteins, but these methods are costly and time consuming. There is certainly room for computational prediction methods to enhance the process.
To make a true impact, though, scientists of different disciplines need to identify the research questions that can realistically be tackled by AI. As DeepMind’s chief executive, Demis Hassabis says: “We’re just starting to understand what biologists would want.” 2
Application I: fostering disease understanding
AI-based protein structure prediction could aid a better understanding of protein misfolding diseases, including a number of rare genetic diseases, neurodegenerative diseases such as Parkinson’s, and prion diseases.6 While this is not likely to directly benefit patients, better disease understanding should improve diagnosis and treatment in the long run.
Application II: deciphering pathogenic proteins
Humans will continue to face emergence of novel pathogens. To counteract those threats, researchers need to understand how they infect humans, how they spread, and how they interact with the immune system. Protein structure prediction can help to identify exposed epitopes and druggable sites.7 As we have witnessed in 2020, speed is an important factor in a pandemic response. Here, AlphaFold has already proved its potential worth: faced with a bacterial protein that researchers at the Max Planck Institute for Developmental Biology in Tübingen had spent a decade deciphering, AlphaFold solved the structure within half an hour.2
Application III: improved efficacy in drug discovery
Structure-based drug discovery, such as screening methods which identify drug-target interactions based on in-silico predicted protein structures, could save time and money in the identification of new drugs.8,9 Structural predictions of proteins involved in drug metabolism, and their interactions with drug candidates, could improve drug safety and reduce animal testing. Moreover, improved understanding of how proteins fold from their amino acid sequence could herald a new age of protein engineering, including therapeutic proteins.
Although AlphaFold has the potential to tackle many of those challenges in future applications, caveats remain with the current version.
Despite accurately predicting membrane-bound proteins – which constitute a major class of drug targets, difficult to assess by X-ray crystallography – AlphaFold’s predicting of structures within protein complexes has been less impressive. Many of the dynamic signalling processes within cells happen in complexes. Those processes are often perturbed in disease, making the signalling proteins another important class of drug targets.
Moreover, the usefulness of AlphaFold and similar tools is still confined to proteins for which experimental predictions can be derived, which serve as training sets to teach the algorithms and as validations to assess the algorithm’s performance.
Despite these shortcomings, it is important to bear in mind AlphaFold’s rapid evolution since the then record ~60% accuracy it achieved at the 2018 CASP13 challenge.
AlphaFold’s performance has given us a taste of what AI could do for biology and drug discovery within the foreseeable future. Yet to leverage the full potential of this impressive tool, the communication gap between disciplines must be bridged. Only then can we tweak the AI towards the areas most relevant for biomedical research.
DeepMind’s blog on AlphaFold’s CASP14 performance gives an overview of the protein folding problem, AlphaFold’s basic architecture and features, as well as initial ideas on real-world applications. Also includes a short video, explaining the technology.
2. Nature 588, 203-204 (2020)
Short Nature article that discusses AlphaFold’s performance at CASP14, featuring quotes from different researchers working in the field discussing AlphaFold’s potential impact.
DeepMind’s blog on AlphaGo, discussing the Chinese strategy game Go, AlphaGo’s evolution and its (stunning) success. Also includes a link to the 130 min movie on AlphaGo .
Nature education article, which covers the basics of protein structure and protein folding.
Nature education article, which covers the basics on protein (mis)folding and highlights the roles of misfolded proteins for neurodegenerative diseases such as Alzheimer’s and Parkinson’s, and prions.
Article discussing how structural understanding of pathogenic proteins can aid development of drugs and vaccines – using the example of the Zika virus.
An article describing different techniques in structure-based drug discovery, and the roles that computational techniques – and especially AI – can play in the identification of new drugs based on structural knowledge.
Detailed article describing the basics of protein folding, protein structure prediction and protein design, highlighting a number of recent developments in the field (as of 2019).