In the past two months, DALL·E 2 has taken over the internet. From Bart Simpson edited into Egyptian art to Donald Trump as the Lorax, text-to-image AI produces amazing results. Caption: “Panda weaving a basket made of cyclohexane”, DALL·E 2 Are these an impressive-but-gimmicky party trick? Or can these innovations be harnessed for applications in scientific domains? Many […]
The history of chemistry has been epitomized by individual chemists coming up with hypotheses, running experiments at lab-scale, and producing discoveries. But in 2022, chemistry data is generated at a scale previously unseen, computers can rapidly process that data, and the data can be widely distributed at relatively minimal cost. This new frontier of global-scale […]
In drug discovery, there are two main approaches to hit finding: 1) virtual screening of existing small molecule libraries and 2) generative design of new molecules. Generative molecule design can result in better binders, but it may be unknown how to synthesize them. The task of retrosynthesis – designing a synthesis pathway for a molecule […]
We live an a world where chemistry computation is increasingly competitive with experimentation. AlphaFold predicts protein structure with accuracy sufficient for many applications. In the limit scenario, computational chemists envision biochemistry simulations on a scale that allows them to trace exact mechanisms of disease. A recent pre-print achieves molecular simulation with nanosecond time steps, which is 1000 […]
Neural sequence models have recently produced astonishing results in domains ranging from natural language to proteins and biochemistry. Current sequence models trained on text can explain jokes, answer trivia, and even write code. AlphaFold is a sequence model trained to predict protein structure with near-experimental accuracy. In the chemistry domain, sequence models have also been used for learning problems on […]
For the last decade, the field of deep learning and AI has been dominated by applications to images and text. However, in the past two years, the field has seen an upsurge of chemical and biological applications. The international conference on learning representations [ICLR], is the largest academic AI conference in the world, with an h5-index […]
Extremely data-efficient ligand generation What is a sufficient number of data points to train a deep learning algorithm? 1,000? 1 million? 1 billion? Of course, it depends on the problem. But it also depends on the neural network architecture and training algorithm chosen to solve the problem. Powers et. al. recently published a preprint describing a ligand […]
The study of structural biochemistry is based on the axiom that “structure determines function”. A corollary of that axiom is that, for proteins, “function is independent of sequence, given structure”. That is, knowing the sequence of a protein is helpful for determining function insofar as it informs us about the structure of the protein. Many […]
In structure-based drug discovery, most methods rely on two key elements of accuracy: accurate protein structure modeling and accurate drug structure modeling. AlphaFold is able to predict protein structures with unprecedented accuracy. But drug structure modeling lags behind, with current models for conformer generation only providing 67% accuracy on a common molecular conformer benchmark. GeoDiff predicts drug conformations with […]