Team ZONTAL January 2, 2024 No Comments

Data Centricity: Key for the Successful Digital Journey towards a Digital Lab

While data meshes and data fabrics are often discussed, data centricity is frequently not consequently executed, despite its significant implications. Data centricity places data at the core of operations and decision-making processes, going beyond mere data utilization in applications. To implement data centricity, three principles are crucial: recognizing data as a key asset, ensuring data […]

Disconnection-Aware Retrosynthesis

In a new paper, researchers at IBM Research recently presented a novel approach to retrosynthesis. In chemical synthesis, the retrosynthesis problem involves determining the optimal sequence of steps to synthesize a given molecule starting from readily available building blocks, known as precursors. In retrosynthesis, a chemist or computational model must first identify a suitable disconnection […]

Team ZONTAL February 2, 2023 No Comments

DiffDock – A Diffusion Model for Molecular Docking

Molecular docking is a critical task in drug design, as it involves predicting the binding structure of a small molecule ligand to a protein. Traditional methods for molecular docking rely on search-based algorithms and scoring functions to estimate the correctness of a proposed structure. However, these methods can be slow and inaccurate, especially for high-throughput […]

Team ZONTAL January 19, 2023 No Comments

RFDiffusion – Leveraging the Power of DDPMs to Generate Protein Sequences and Structures

RFDiffusion is a new method for protein design that leverages the power of denoising diffusion probabilistic models (DDPMs) to generate protein sequences and protein structures. This approach represents a significant advance in the field of protein design, as it allows for the design of complex protein architectures and functions from simple molecular specifications. Figure 1: RFDiffusion […]

Team ZONTAL January 10, 2023 No Comments

MILCDock – Machine Learning Consensus Docking

Molecular docking tools are commonly used in drug discovery to computationally identify new molecules through virtual screening. However, these tools often suffer from inaccurate scoring functions that can vary in performance across different proteins. To address this issue, researchers at Brigham Young University have developed MILCDock, a machine learning consensus docking tool that uses predictions from […]

Team ZONTAL September 1, 2022 No Comments

DALL·E 2, Imagen, and Applications to Chemistry

In the past two months, DALL·E 2 has taken over the internet. From Bart Simpson edited into Egyptian art to Donald Trump as the Lorax, text-to-image AI produces amazing results. Caption: “Panda weaving a basket made of cyclohexane”, DALL·E 2 Are these an impressive-but-gimmicky party trick? Or can these innovations be harnessed for applications in scientific domains? Many […]

Team ZONTAL August 23, 2022 No Comments

Making Chemistry Knowledge Machine-Actionable

The history of chemistry has been epitomized by individual chemists coming up with hypotheses, running experiments at lab-scale, and producing discoveries. But in 2022, chemistry data is generated at a scale previously unseen, computers can rapidly process that data, and the data can be widely distributed at relatively minimal cost. This new frontier of global-scale […]

Team ZONTAL August 12, 2022 No Comments

Transformer Retrosynthesis

In drug discovery, there are two main approaches to hit finding: 1) virtual screening of existing small molecule libraries and 2) generative design of new molecules. Generative molecule design can result in better binders, but it may be unknown how to synthesize them. The task of retrosynthesis – designing a synthesis pathway for a molecule […]

Coarse-grained Molecular Dynamics with Geometric Machine Learning

We live an a world where chemistry computation is increasingly competitive with experimentation. AlphaFold predicts protein structure with accuracy sufficient for many applications. In the limit scenario, computational chemists envision biochemistry simulations on a scale that allows them to trace exact mechanisms of disease. A recent pre-print achieves molecular simulation with nanosecond time steps, which is 1000 […]

We live an a world where chemistry computation is increasingly competitive with experimentation. AlphaFold predicts protein structure with accuracy sufficient for many applications. In the limit scenario, computational chemists envision biochemistry simulations on a scale that allows them to trace exact mechanisms of disease. A recent pre-print achieves molecular simulation with nanosecond time steps, which is 1000 times longer than typical molecular dynamics (MD) time steps. It does this while retaining the same macro-molecular behavior as traditional MD. This allows for longer simulations with larger, more complex systems.

Atomic simulations lose accuracy as they increase in computational efficiency. Density Functional Theory (DFT) simulations accurately model bond-breaking and bond-forming at the subatomic scale. Molecular Dynamics (MD) simulations model higher-level inter-atomic interactions via potentials, or force fields, while sacrificing accuracy for bond forming and breaking in exchange for faster computation and longer simulations. The authors’ approach is no different – they trade atom-level simulation for longer simulations that retain macro-molecular behavior.


Rather than modeling individual atoms, the approach models cluster centers, called “beads”.


Stochastic Predictions

Because coarse-grained MD leads to inherent approximation error, one goal of the authors’ architecture was to model randomness in the predictions. To do this, instead of predicting a single number for the next time step, the architecture predicts a distribution given by a mean and a variance.


Historical Information

The coarse-graining procedure removes the Markov property of the dynamics, so they also designed their architecture to incorporate historical information from previous states.



The model consists of three learned networks which are trained end-to-end.

The first network, the Embedding GNN, takes as input a fine-level (atom-level) graph and produces node embeddings that are shared across time steps. Atoms are then grouped with a graph clustering algorithm.

The second network, the Dynamics GNN, takes as input the node embeddings as well as the node positions and velocities for the last k time steps. Based on their clusters, these embeddings, positions, and velocities are combined into node and edge features for a coarse graph. The Dynamics GNN processes this graph to predict a mean and a standard deviation for the acceleration of the nodes (“beads”) for the coarse graph.

With just the first two networks, the architecture ran into stability issues, experiencing “bead collision”, where two beads come within 1 Å of each other. Thus, the authors added a third “Score GNN” network to predict a gradient of the predicted probability density which, when applied to the predicted coordinates, denoises them to the true coordinates.

The model is trained end-to-end, where the objective function is to minimize the negative-log-likelihood of the data under the predicted distribution.

At inference time, the model predicts bead acceleration, and new bead positions are calculated using Euler integration.


The model architecture.



Because the model does not predict atom-level coordinate updates, they had to find other ways to evaluate their coarse-grained model. One metric is “radius of gyration”, which measures distance from center of rotation. The radius of gyration was found to correlate strongly (r^2 = 0.90) with the true radius of gyration for the coarse states.

The authors also compared predicted and true “relaxation times” of the molecules, and found an r^2 correlation of 0.48. Correlation in this metric demonstrates that the model not only matches the distribution over states, but also captures realistic dynamics.


The authors present a new architecture for coarse-grained molecular dynamics simulation. The model operates on a much larger time step than traditional MD, allowing for larger simulations on larger timescales. This is promising work in the direction of large-scale molecular simulation.

There two main caveats. One is that this network models acceleration of cluster centers, limiting its usefulness for atomistic modeling. However, atom positions can be inferred using techniques like the one described in this paper.

The other caveat is easily remedied, and likely would provide a significant performance boost. The chosen GNN architectures are not equivariant to the frame of reference. Instead they learn equivariance from data, which takes learning capacity away from the task of learning dynamics. This can be easily fixed by using an equivarant architecture for the GNN, for example, an SE(3) transformer.