Project Description

 Interpretable Deep Generative Models for Drug Development

Drug discovery is time-consuming and costly: it takes approximately 10-15 years and between $500 million to $2 billion to fully develop a new drug. Molecule optimization is a critical step in drug discovery to improve desired properties of drug candidates through chemical modification. For example, in lead (molecules showing both activity and selectivity towards a given target) optimization, the chemical structures of the lead molecules can be altered to improve their selectivity and specificity. Conventionally, this process is facilitated based on knowledge, intuition and experience of medicinal chemists, and is done via fragment-based screening or synthesis. Such an approach is not scalable. The objective of this project is to develop a new class of Artificial Intelligence (AI) methods and tools to conduct in silico molecule generation. Specifically, this project will focus on the following important aspects in AI-based in silico molecule optimization: 1) major scaffold retention, 2) molecule diversity, 3) molecule synthesizability; 4) multi-property optimization; and 5) interpretability. The central hypothesis underlying the proposed research is that the increasing amount of publicly available molecule data, including molecule properties, synthesis pathways and drug-likeness, contains a wealth of information that, if properly analyzed and utilized, can provide key insights in revealing, characterizing and automating the computational molecule generation and optimization process.

Meeting the objectives will require the development of novel AI models and methods for in silico molecule optimization. Examining designs based on new deep generative models, deep graph convolutional networks, conditional sampling approaches and reinforcement learning methods that learn from pairs of molecular graphs, and accordingly generate new molecular graphs with improved biochemical and biophysical properties, is necessary. The proposed research will also provide a holistic framework to explore prospective molecules that are sufficiently different from one another; and will investigate molecular graph search approaches and Bayesian optimization methods to guide search in the latent embedding (representation) space. For multi-property optimization, the proposed research will provide a pipeline structure and new reinforcement learning approaches. To understand and facilitate interpretable generative models, the proposed research will develop a set of novel methods including network dissection, perturbation-based attribution methods, self-explaining methods and disentanglement. This project will have substantial societal and educational impacts, and will enhance diversity in STEM through education and research dissemination. The broader scientific contributions of the will be the development of innovative AI methodologies and tools that will aid drug development. These technical innovations will not only address the key computational challenges in generative models for molecules, but also potentially generalize to other problems (e.g. cheminformatics, materials design) in which generation of structural data is highly needed and interpretation of such generation process is critical. The proposed research can potentially reduce the investment costs during drug discovery, increase its successful rate significantly, and ultimately aid in the improvement of the US health care quality.