A new era of biological research has been unlocked, with an artificial intelligence (AI) predicting the 3D shape of nearly every protein known to science – just one year after its first data release.
Thanks to AlphaFold, an AI tool developed by the Google-owned AI company DeepMind, more than 200 million protein structures have now been shared online in a free-to-access, searchable database, called AlphaFold DB.
The accomplishment paves the way for untold avenues of scientific exploration into proteins, the building blocks of life. And researchers are giddy with excitement.
"Determining the 3D structure of a protein used to take many months or years, it now takes seconds," cardiologist Eric Topol from the Scripps Research Translational Institute explained in a statement about the data release.
"With this new addition of structures illuminating nearly the entire protein universe, we can expect more biological mysteries to be solved each day."
In collaboration with scientists at the European Molecular Biology Laboratory's European Bioinformatics Institute (EMBL–EBI), DeepMind unveiled its first batch of AlphaFold predictions in July last year.
Heralded as a revolutionary tool that would transform biological research and accelerate drug discovery, AlphaFold predicts the 3D shape of proteins based on their amino acid sequences.
Linked together in chains, these amino acid sequences spool long proteins that are folded into pleated sheets and twisting ribbons.
By understanding the shape which any given protein folds into, scientists can get a grip on how that protein operates, deciphering what its main role is inside cells.
AlphaFold was designed to accelerate that process, providing in this latest data release more than 200 million predicted structures of proteins found in plants, bacteria, animals, and other organisms.
"That hope has become a reality far quicker than we had dared to dream," DeepMind chief executive Demis Hassabis said in a statement about the latest data release.
A big day for #AI in life science. Release of >200 million predicted 3D protein structures from open-source #AlphaFold, nearly the entire protein universe
— Eric Topol (@EricTopol) July 28, 2022
See: https://t.co/gjASHqACqa @DeepMind
my comment below pic.twitter.com/yPgtPHMZac
Already, researchers have used AlphaFold's first batch of predictions to refine their understanding of deadly diseases such as malaria, opening the door to improved vaccines, and to deciphering biological puzzles about behemoth proteins that have stumped scientists for decades.
Not to mention identifying never-seen-before enzymes that could help up-cycle plastic pollution.
"AlphaFold has sent ripples through the molecular biology community," said Sameer Velankar, a structural biologist who heads up EMBL-EBI's Protein Data Bank.
"In the past year alone, there have been over a thousand scientific articles on a broad range of research topics which use AlphaFold structures; I have never seen anything like it.
"And this is just the impact of 1 million predictions," Velankar added. "Imagine the impact of having over 200 million protein structure predictions openly accessible in the AlphaFold database."
Although the open-source AlphaFold software has been available to researchers since its release last year, having millions of predicted protein structures at their fingertips in a searchable database will undoubtedly expedite research.
According to EMBL-EBI, around one-third of the more than 214 million predictions have been classed as highly accurate, on par with protein structures derived from the usual experimental methods, such as X-ray crystallography and cryo-electron microscopy.
For decades scientists have painstakingly inferred molecular structures from the fuzzy pictures these methods produce – perhaps the most famous being Rosalind Franklin's image of helical DNA.
The quality of AlphaFold's predictions varies, however, and might be less accurate for rarer proteins that scientists know little about. So in some cases, its predicted structures might be used to make sense of experimental data.
Despite the gargantuan data dump, there's still a whole lot of life that AlphaFold doesn't capture, including predictions about how proteins interact once assembled.
Microbial proteins identified from traces of genetic material in soil and seawater are also not in the database – yet these microorganisms represent an untapped resource of potent compounds, since scientists have cataloged only a tiny fraction of all microbial life on Earth.
Some scientists have also raised concerns about the accessibility of the AlphaFold database and its staggering 23 terabytes of contents, which might be less feasible for some research teams to access given the costly computer power and cloud-based storage sophisticated data analyses would demand.
Nonetheless, the impending benefits to human health – which DeepMind says it has carefully weighed against potential bioethical risks – are so grand they are almost unimaginable.
"I expect that this latest update will trigger an avalanche of new and exciting discoveries in the months and years ahead," structural biologist and EMBL-EBI senior scientist, Dame Janet Thornton told The Guardian. "And this is all thanks to the fact that the data are available openly for all to use."
DeepMind and EMBL-EBI will continue to refresh the AlphaFold database periodically. But for now, you can read more about the latest data release and past discoveries here.