Today we break down a paper recently accepted for publication at ICLR 2022: “Independent SE(3) Equivariant Models for End-to-End Rigid Docking” [1]. Docking is the problem of finding the pose and orientation by which a ligand binds to a protein. Solving the computational docking problem would increase our understanding of biological interactions at the molecular level and catalyze a leap forward in drug discovery. But current docking methods are slow, preventing the use of docking at scale. The authors’ method, “EquiDock”, takes advantage of symmetries in 3D space to achieve a 80-500x speed-up in protein-protein docking. How did they do it? Let’s dive in.

“Inductive Bias” is a fancy term used by machine learning researchers to describe design elements in the structure of a neural network that adapt it to a specific problem. For example, convolutional neural networks (CNNs) are designed to be “translation invariant” – after being trained on images with dogs in the bottom left quadrant, a CNN will still recognize images with dogs in the top right quadrant. By creating a network with good “inductive bias”, the learning problem becomes easier for the network to solve and the network can be expected to generalize better.

Similarly, when working with data in 3D Euclidean space, like proteins and other molecules, we can take known physical principles, and build them into neural networks. In the case of protein-protein docking, the docking pose is “equivariant” to their initial starting positions and orientations – the pose rotates and shifts proportionally to rotation and shift in the inputs. EquiDock predicts docking poses that are equivariant to the poses of the input proteins.

The algorithm is also pairwise independent, meaning that if you reverse the roles of protein 1 and protein 2, the result is the same.

Now let’s get into the algorithm. At the heart of EquiDock is an equivariant keypoint-prediction graph neural network, combined with a differentiable “keypoint alignment” algorithm.

First, the algorithm creates two graphs, {V1, E1}, and {V2, E2}, one for each protein. It considers C-alpha atoms as “nodes”, with edges between C-alpha atoms at fall within a certain radius. The network takes as input the C-alpha coordinates of two proteins, as well as node features (e.g. amino acid identity) and edge features (e.g. distance between C-alpha atoms) that are invariant to the coordinate frame of reference.

The IEGMN (Independent E(3)-Equivariant Graph Matching Network) then transforms the coordinates, node features, and edge features, combining information from all of these to update the node features and coordinates. This is done recurrently, with shared weights for each layer. The last layer in the IEGMN predicts K “keypoints”, or predicted binding pocket points, as weighted sum of updated node coordinates, weighted by a multi-head attention mechanism based on the updated node features from each protein. Crucially, all updates to the atom coordinates, including this final keypoint layer, respect equivariance.

The next step is a differentiable singular value decomposition to recover a rotation that will rotate the predicted keypoint coordinates (Y1) for one protein to align with the keypoint coordinates (Y2) of the other protein. The translation then comes from the vector that translates the mean of Y1 to be centered at the mean of Y2.

This predicted rotation and translation for Y1 is then applied to the original coordinates X1, to obtain a final prediction for the coordinates of the docked protein. The main loss is the mean-squared error between this prediction and the true coordinates.

The architecture is shown below:

Now, there are some unanswered questions left. What is to prevent the network from learning keypoints that don’t correspond to the true binding pocket? Does this network respect physical non-intersection constraints?

The authors address the first question by adding an auxiliary loss term to make the key points match the true binding pocket points by an optimal transport loss.

They also define a function that defines the borders of each protein and add an auxiliary loss penalizing atoms from the other protein that fall within this border.

The model performs well, achieving better accuracy than many docking algorithms that take 100x longer, though lower accuracy than the HDock algorithm. But EquiDock achieves its accuracy much faster, enabling high-throughput docking.

What are some possible extensions of this research? The most natural is to apply the same model to drug-protein docking. The exact same algorithm can be used, using drug atom coordinates instead of C-alpha coordinates. The authors also plan to extend the work to flexible docking and molecular dynamics, allowing for flexibility in protein-ligand interactions.

What are some potential use cases of this model? With fast protein-protein docking, it is possible to computationally create much larger protein-protein interaction networks. These protein-protein interaction networks, paired with other types of interaction networks, can be mined for new scientific discoveries about biochemical pathways, biochemical causes of disease, causes of drug toxicity, and more.

[1] “Independent SE(3) Equivariant Models for End-to-End Rigid Docking”, https://arxiv.org/pdf/2111.07786.pdf