Team ZONTAL August 15, 2023 No Comments

Bayesian Flow Networks: A Paradigm Shift in Generative Modeling

Generative modeling has undergone a transformative journey in recent times, thanks to the emergence of powerful neural networks. These networks possess an unprecedented ability to capture intricate relationships among diverse variables, revolutionizing our capacity to create coherent models for high-resolution images and complex data. This shift is attributed to the art of decomposing joint distributions […]

Beyond Traditional AI: Embracing Multimodal Challenges with Meta-Transformer

Introduction In the realm of pharmaceutical research, the quest to unlock groundbreaking insights often requires navigating through a vast sea of diverse data modalities. Imagine a powerful tool that can seamlessly process and integrate information from text, images, audio, 3D point clouds, video, graphs, and more, transcending the limitations of conventional AI approaches. Enter “Meta-Transformer: […]

Disconnection-Aware Retrosynthesis

In a new paper, researchers at IBM Research recently presented a novel approach to retrosynthesis. In chemical synthesis, the retrosynthesis problem involves determining the optimal sequence of steps to synthesize a given molecule starting from readily available building blocks, known as precursors. In retrosynthesis, a chemist or computational model must first identify a suitable disconnection […]

Team ZONTAL February 2, 2023 No Comments

DiffDock – A Diffusion Model for Molecular Docking

Molecular docking is a critical task in drug design, as it involves predicting the binding structure of a small molecule ligand to a protein. Traditional methods for molecular docking rely on search-based algorithms and scoring functions to estimate the correctness of a proposed structure. However, these methods can be slow and inaccurate, especially for high-throughput […]

Team ZONTAL January 19, 2023 No Comments

RFDiffusion – Leveraging the Power of DDPMs to Generate Protein Sequences and Structures

RFDiffusion is a new method for protein design that leverages the power of denoising diffusion probabilistic models (DDPMs) to generate protein sequences and protein structures. This approach represents a significant advance in the field of protein design, as it allows for the design of complex protein architectures and functions from simple molecular specifications. Figure 1: RFDiffusion […]

Team ZONTAL January 10, 2023 No Comments

MILCDock – Machine Learning Consensus Docking

Molecular docking tools are commonly used in drug discovery to computationally identify new molecules through virtual screening. However, these tools often suffer from inaccurate scoring functions that can vary in performance across different proteins. To address this issue, researchers at Brigham Young University have developed MILCDock, a machine learning consensus docking tool that uses predictions from […]

Team ZONTAL December 20, 2022 No Comments

Reality or Illusion – What can AI do for Drug Discovery?

I thoroughly enjoyed meeting Andreas Bender at the recent BioTechX conference in Basel. He gave a very honest and thought-provoking presentation on a series of papers released in Drug Discovery Today, titled: Artificial intelligence in drug discovery: what is realistic, what are illusions?  Let’s recap his main findings: Artificial intelligence (AI) has had a profound impact on many areas […]

Team ZONTAL September 1, 2022 No Comments

DALL·E 2, Imagen, and Applications to Chemistry

In the past two months, DALL·E 2 has taken over the internet. From Bart Simpson edited into Egyptian art to Donald Trump as the Lorax, text-to-image AI produces amazing results. Caption: “Panda weaving a basket made of cyclohexane”, DALL·E 2 Are these an impressive-but-gimmicky party trick? Or can these innovations be harnessed for applications in scientific domains? Many […]

In the past two months, DALL·E 2 has taken over the internet. From Bart Simpson edited into Egyptian art to Donald Trump as the Lorax, text-to-image AI produces amazing results.

pandaCaption: “Panda weaving a basket made of cyclohexane”, DALL·E 2

Are these an impressive-but-gimmicky party trick? Or can these innovations be harnessed for applications in scientific domains?

Many AI methods are developed in the laboratory before seeing practical adoption. This allows for measurable improvement of algorithms before real-life application. For example, reinforcement learning algorithms that were tuned on video games are now are used for robotics. Likewise, the Transformer architecture that was developed for text (admittedly a useful application in its own right) was recently adapted into AlphaFold, a model that has now folded every protein known to science.

The backbone of text-to-image AI is a newly popular type of neural networks called “diffusion models”. These models gradually transform an image of random pixels into a high-resolution image. The DALL·E 2 (from OpenAI) and Imagen (Google) models render photorealistic images from arbitrary text descriptions.

elephant_toothpasteCaption: “Elephant toothpaste explosion with foam in the shape of an elephant”, DALL·E 2

 

So how can diffusion models help chemistry?

First, we should differentiate between conditional diffusion models and unconditional diffusion models. Unconditional diffusion models generate images randomly sampled from the distribution of the training data. On the other hand, conditional diffusion models modify that generated image based on some other type of input. In the case of DALL·E 2 and Imagen, this other input is text.

Diffusion models have some constraints. The output must have a fixed size – for example, diffusion models can generate images of fixed resolution, but they cannot generate variable-length sentences. Additionally, diffusion model outputs must be continuously-valued (though it is possible to generate continuously-valued embeddings of discretely-valued data). Finally, diffusion models are generative models, meaning that they are designed for sampling from distributions. So, it makes most sense to use them when there is an interesting distribution over an output variable, as opposed to a single correct value.

What chemistry prediction problems fall within these constraints? Two recent papers predicted molecular conformers using diffusion models conditioned on molecular SMILES strings. GeoDiff predicts molecular coordinates, and Torsional Diffusion predicts molecular torsion angles. Protein structure generation also fits within this framework, as another recent paper predicts protein atom coordinates conditioned on backbone constraints.

jedi

Caption: “Jedi uses a lightsaber to slice DNA in half”, DALL·E 2

Diffusion models have the benefits of the high-fidelity samples generated by generative adversarial networks (GANs) while maintaining a simpler architecture and being easier to train. Though chemistry applications are just beginning to emerge, diffusion models have yielded state-of-the-art accuracy in molecular conformer generation and offer a promising approach for other high-dimensional sampling problems.