AI Has Just Caused a Revolution in Biology

AlphaFold’s Predicted Structure for Q8I3H7, a protein that may protect the malaria parasite against immune system attack.

On the tree of human knowledge, there are fundamental problems that, once solved, unlock entire new branches of progress that were otherwise impossible to explore. A few years ago, one such problem was addressed: predicting the 3D structure of proteins from their two-dimensional amino acid sequences, commonly referred to as the “protein folding” problem. I am writing this short post to help enlighten those who may not realize the scientific miracle unfolding beneath our feet. To fully appreciate why recent breakthroughs are so monumental, let’s take a quick tour through cellular biology, explore how proteins function, and discover why accurately predicting their 3D forms is so transformative.

Protein Basics

Picture a massive factory: assembly lines whirring, conveyor belts gliding, workers welding, and supervisors organizing. Now imagine that scene in microscopic form inside a cell. That is what proteins are, molecular machines tirelessly handling just about every bodily function. They are the assembly lines, the conveyor belts, the welders, and the supervisors. They build new tissues, digest your lunch, deliver oxygen throughout the bloodstream, and some even assemble other proteins themselves. They are the engine of life.

At their core, proteins are linear chains made up of smaller subunits called amino acids. Think of them like beads on a string. There are twenty standard amino acids used by virtually all organisms. The specific length and order of these beads is different for every protein. Once that chain is assembled, it folds itself into a 3D structure. Similar to how a stretched rubber band crinkles when released, these amino acid chains snap into shape. Crucially, the exact amino acid sequence determines the 3D fold, which in turn decides what “machine” that protein will become. A chain of amino acids is relatively inert, but the 3D microscopic machine they fold into suddenly bestows them with function.

The scope of what proteins do is enormous, and it can’t be overstated:

  • Enzymes: Some proteins act as catalysts, speeding up the chemical reactions that keep you alive.

  • Messengers: Hormones like insulin are proteins that help regulate processes such as blood sugar balance.

  • Transporters: Hemoglobin, the protein that carries oxygen in your blood, is literally fueling every move you make.

  • Structural Elements: Proteins form the core scaffolding of hair, nails, and muscles.

If there is a job to be done in your body (or in any living organism for that matter), you can bet there is a protein (or team of proteins) making it happen. It is as if each protein has a perfect shape, allowing it to perform a highly specific role. We have discovered the amino acid sequence for over 200 million proteins. But, until a few years ago, we only knew the 3D shape of about 150,000 of them.

The Central Dogma of Biology

Proteins are so fundamental that they’re essentially the end-product of what your DNA codes for. This journey is known as the “central dogma of biology”. It consists of three steps:

  1. DNA is transcribed into RNA.

  2. RNA is then translated into an amino-acid chain.

  3. That chain folds into a functional protein.

For the most part, this dogma is core to all life, including bacteria, plants, and humans. If it is alive, it follows that pattern. You have also, without a doubt, heard the word “gene” before. A gene is a small segment of DNA, and most genes represent blueprints that encode specific proteins. Now you know.

For example, hemoglobin is a protein that carries oxygen in your blood. In sickle cell disease the beta-globin gene on chromosome 11 has a single-letter mutation in its DNA code. This tiny change, after transcription into RNA and translation into an amino acid chain, results in hemoglobin with a slightly different 3D structure, which leads to clumping together.

Even the steps of transcription and translation in the central dogma are performed by proteins. RNA polymerase is a protein complex that reads DNA and builds a complementary RNA strand. A ribosome is another protein complex (with some RNA) that reads RNA and builds the respective amino acid chain. It’s proteins all the way down.

We have sequenced the DNA of countless species, but the actual utility of many of the genes we have found have eluded us. Without knowing the shape of the protein those genes code for we are left wondering what exactly is their function?

A Short History of 3D Structure Prediction

For much of the 20th century, the only way to reliably determine a protein’s 3D shape was through the labor-intensive method of X-ray crystallography. This process involves:

  1. Crystallizing the protein.
    (a feat in itself, because not all proteins are willing to form tidy crystals).

  2. Bombarding the crystal with X-rays.

  3. Observing the diffraction patterns.

  4. Reconstructing the protein’s 3D shape from that data.

In 1957, British biochemist John Kendrew spent twelve years determining the very first protein structure (myoglobin). This monumental effort was recognized with a Nobel Prize, and it set the stage for a revolution in structural biology. But for decades, progress was slow and painstaking. Only around 100 protein structures were resolved in the following years, often taking up entire PhD projects and costing tens of thousands of dollars per protein.

Eventually, other techniques like nuclear magnetic resonance (NMR) spectroscopyand cryo-electron microscopy (cryo-EM) joined the game, each offering its own advantages. NMR, for instance, can analyze proteins in a more natural, solution-based environment, but is limited to relatively small proteins. Cryo EM, on the other hand, has become a powerhouse for capturing large, complex protein assemblies in exquisite detail, yet it also requires expensive, specialized equipment and expertise.

Despite these advances, time and money remained enormous barriers. Over 60 years after Kendrew’s first breakthrough, we have painstakingly mapped around 150,000 protein structures using these experimental methods. Yet there are more than 200 million proteins known to exist across Earth’s countless life forms. Clearly, these traditional approaches are too slow and expensive to map them all.

Historically, computational predictions were often very limited. The Critical Assessment of Structure Prediction (CASP), a biennial competition started by John Moult at the University of Maryland (my alma mater!), challenges teams to determine a protein’s 3D shape from its amino acid sequence. Despite the incredibly bright minds participating in CASP, the results often fell far short of expectations. Rarely did any software successfully reconstruct a protein’s structure with meaningful accuracy.

Until AlphaFold.

AlphaFold

DeepMind, an Artificial Intelligence research lab based out of London, has effectively solved the decades-long problem of protein folding with AlphaFold, software that can, for the first time, accurately predict a proteins 3D structure. DeepMind unveiled it at the CASP competition and completely dumbfounded everyone. They demonstrated the ability to make predictions with near-experimental accuracy. A level of precision thought to be still decades away.

The real bombshell was not the fact that AlphaFold could effectively solve protein folding. Instead, it was when DeepMind immediately used it to predict the structure of all 200 million proteins known to exist to science. Then, released them in a free public database: https://alphafold.ebi.ac.uk/.

Artificial Intelligence has now tackled a root-node problem in our tree of knowledge, opening up new branches of exploration that were previously unreachable. The progress now possible — across biology, medicine, and beyond — will likely be one of the most significant scientific contributions in the history of humanity. It will be sight to behold.

The Future

A lot is about to change. Very quickly. We are on the precipice of a revolution in drug discovery, synthetic biology, and personalized medicine. We’re witnessing a fundamental shift in how we study and manipulate the building blocks of life. For example, we have already used AlphaFold for:

  • New Drug Discovery
    Researchers now understand structure of a vital malaria protein. This can be used to help develop a truly effective malaria vaccine.
    Source: (https://www.nature.com/articles/s41467-022-33379-6)

  • Fighting Antibiotic Resistance
    Bacteria have developed proteins that destroy our antibiotics (like penicillin) by cleaving their chemical structure. Understanding how these proteins function can help prevent antibiotic resistance, potentially saving millions of lives.
    Source: (https://www.colorado.edu/lab/sousa/home)

  • Potential Climate Change Mitigations
    Synthetic biologists have begun engineering enzymes for tasks like breaking down plastic and toxic waste. By knowing a protein’s shape in detail, scientists can tweak its active site or overall structure to better catalyze a desired reaction.
    Source: (https://www.pnas.org/doi/10.1073/pnas.2121426119)

  • Reconstruct More Accurate Evolutionary History
    The biology group I worked with at university used it to show that the mechanism that splices fragmented genes together, once assumed to have evolved in mitochondria, actually originated in archaea, shaking up long-held evolutionary beliefs.
    Source: (https://www.biorxiv.org/content/10.1101/2024.12.10.627823v1)

The possibilities in front of us feel like something out of science-fiction. Imagine:

  • Highly Personalized Medicine
    Rather than a one-size-fits-all approach, doctors could analyze your unique genetic makeup and use AI-driven protein modeling to customize treatments. Want a therapy engineered just for you? Knowing exactly how your proteins are shaped, and how they differ from others, makes that dream more realistic.

  • Bio-Based Manufacturing
    Envision microorganisms custom-built to produce everything from biodegradable plastics to biofuels at industrial scales, all by leveraging human-designed proteins that optimize every step of the production line. This could slash waste, curb environmental harm, and redefine manufacturing.

  • Rapid Responses to Emerging Diseases
    The next time an unknown virus or bacteria shows up, we could generate detailed protein structures in days. This speed would empower scientists to design vaccines, antivirals, or antibiotics before an outbreak turns into a global crisis.

  • Molecular Diagnostics
    Protein structure predictions will enable new diagnostic devices that could detect conditions at the molecular level. For example, they might pinpoint a specific protein variant tied to a certain cancer before symptoms ever show up.

We are on the fast track to a healthier, more prosperous, more sustainable world. The impact artificial intelligence will have on science and progress should not be underestimated.

Next
Next

Blog Post Title One