Beyond Traditional AI: Embracing Multimodal Challenges with Meta-Transformer

Introduction In the realm of pharmaceutical research, the quest to unlock groundbreaking insights often requires navigating through a vast sea of diverse data modalities. Imagine a powerful tool that can seamlessly process and integrate information from text, images, audio, 3D point clouds, video, graphs, and more, transcending the limitations of conventional AI approaches. Enter “Meta-Transformer: […]

MILCDock – Machine Learning Consensus Docking

Molecular docking tools are commonly used in drug discovery to computationally identify new molecules through virtual screening. However, these tools often suffer from inaccurate scoring functions that can vary in performance across different proteins. To address this issue, researchers at Brigham Young University have developed MILCDock, a machine learning consensus docking tool that uses predictions from […]

Coarse-grained Molecular Dynamics with Geometric Machine Learning

We live an a world where chemistry computation is increasingly competitive with experimentation. AlphaFold predicts protein structure with accuracy sufficient for many applications. In the limit scenario, computational chemists envision biochemistry simulations on a scale that allows them to trace exact mechanisms of disease. A recent pre-print achieves molecular simulation with nanosecond time steps, which is 1000 […]

Machine Learning for Drug Discovery at ICLR 2022

For the last decade, the field of deep learning and AI has been dominated by applications to images and text. However, in the past two years, the field has seen an upsurge of chemical and biological applications. The international conference on learning representations [ICLR], is the largest academic AI conference in the world, with an h5-index […]

The study of structural biochemistry is based on the axiom that “structure determines function”. A corollary of that axiom is that, for proteins, “function is independent of sequence, given structure”. That is, knowing the sequence of a protein is helpful for determining function insofar as it informs us about the structure of the protein. Many recent works including ESM-1b and ProtBERT-BFD have attempted to predict function from sequence. But in a recent preprint, Zhang et. al. achieved state-of-the-art accuracy by predicting function from structure, using self-supervised pre-training and the GearNet architecture.

Self-supervised pre-training “warms-up” a model to a prediction task. It involves training on auxiliary objectives that help the model learn better representations of the data. Labels are derived from the data itself, rather than being manually annotated, hence the term “self-supervised”. Many of the aforementioned sequence-to-function networks use self-supervised pre-training to learn better sequence representations, aiding them in achieving high-accuracy function prediction. In the case of GearNet, the model uses auxiliary structure-based prediction tasks which make it easier to learn the downstream function prediction objective.

 

Dataset:

Until recently, protein structure databases have been limited to experimental structures, limiting the effectiveness of structure-based representation learning. However, AlphaFold recently made possible the creation of massive datasets of predicted protein structures. The 800k AlphaFold-predicted structures in the combined AlphaFold Proteome and AlphaFold Swiss-Prot datasets augment the 182k experimental structures in the Protein Data Bank to bring the available structure datasets up to a scale suitable for massive, structure-based pre-training.

From the protein structure files, the authors generate a graph representation G of each protein. Nodes in the graph represent the alpha carbon of each residue, with a one-hot vector indicating residue identity. Edges in the graph are relational, meaning that there are multiple types of edges (or “relations”). Some edge types indicate proximity in sequence, while other edge types indicate spatial proximity.

Besides the original graph, GearNet also creates a “supergraph” G’ that treats relational edges as nodes and creates new directed edges that connect incoming edges to outgoing edges.

figure 1 blog 10-1The original graph G and the supergraph G’. Different colors represent different relational edge types.

 

Model Architecture:

GearNet is a “Geometry-Aware Relational Graph Neural Network”. It takes vector representations of nodes and edges as inputs, conducts updates to the node features and edge features, and predicts updated node features.

GearNet conducts two types of updates. The first update operates on G’, and updates edge features based on edges in its neighborhood.

figure 2 blog 10

The second update operates on G, and updates node features based on nodes and edges it its neighborhood.

figure 3 blog 10

Self-supervised objectives:

The model is pre-trained on five self-supervised tasks. The first is called “Multiview Contrastive Learning”. In this method, the model is given two different crops of each protein in a batch. The contrastive objective encourages the model to learn similar representations between crops from the same protein, while discouraging it from producing representations similar to crops from other proteins in the batch.

The other four self-supervised techniques are various types of missing information prediction problems. They involve predicting residue type, predicting distance between nodes, predicting angles between edges, and predicting dihedral angles.

 

Experiments and Results:

Once the network is pre-trained, it is “fine-tuned” by adding a “projection” fully connected layer and training the entire model for the specific prediction task. The authors evaluated the network on Enzyme Commission classification, Gene ontology classification, fold classification, and reaction classification [very similar to Enzyme Commission classification]. Their model achieves state-of-the-art performance in almost all benchmarks, while training on several orders of magnitude less data than sequence-based methods. In the best case, for fold classification, GearNET achieves a 7 raw percentage point increase over the next best model.

 

Why does this work?

By learning these auxiliary tasks [e.g. predicting the identity of a masked residue, predicting whether two sub-proteins come from the same protein], the neural network needs to use structural information to infer other structural information. This is similar to the task of using structural information to predict function. So after self-supervised pre-training, the model can converge quickly to an optimum for the function prediction task.

Takeaways:

Structure-based methods also lend themselves to structure-based interpretability, helping to identify the structural reason for function prediction.