Computational Methods to Explore Big Bioassay Data for Better Compound Prioritization

Project Description

Major Research Products

Multi-Assay-based Compound Prioritization

Junfeng Liu and Xia Ning. Multi-assay-based compound prioritization via assistance utilization: A machine learning framework. Journal of Chemical Information and Modeling, 57(3):484–498, 2017. PMID: 28234477. IF: 3.760. [ bib | DOI | www: ]

Abstract: Effective prioritization of chemical compounds that show promising bioactivities from compound screenings represents a first critical step toward identifying successful drug candidates. Current development on computational approaches for compound prioritization is largely focused on devising advanced ranking algorithms that better learn the ordering among compounds. However, such methodologies are fundamentally limited by the scarcity of available data, particularly when the screenings are conducted at a relatively small scale over known promising compounds. Instead, in this work, we explore the structures of bioassay space and leverage such structures to improve ranking performance of an existing strong ranking algorithm. This is done by identifying assistance bioassays and assistance compounds intelligently and leveraging such assistance within the existing ranking algorithm. By leveraging the assistance bioassays and assistance compounds, the data scarcity can be properly compromised. Along this line, we develop a suite of assistance bioassay selection methods and assistance compound selection methods. Our experiments demonstrate an overall 8.34% improvement on the ranking performance over the state of the art.

Code and data are available at here.

Differential Compound Prioritization

Junfeng Liu and Xia Ning. Differential compound prioritization via bi-directional selectivity push with power. Journal of Chemical Information and Modeling, 57(12):2958–2975, 2017. PMID: 29178784. IF: 3.760. [ bib | DOI | arXiv | www: ]

Abstract: Effective in silico compound prioritization is critical to identify promising candidates in the early stages of drug discovery. Current methods typically focus on compound ranking based on one single property, for example, activity, against a single target. However, compound selectivity is also a key property that should be deliberated simultaneously so as to reduce the likelihood of undesired side effects of future drugs. In this paper, we present a novel machine-learning based differential compound prioritization method dCPPP. This dCPPP method learns compound prioritization models that rank active compounds well, and meanwhile, preferably rank selective compounds higher via a bi-directional push strategy. The bi-directional push is enhanced with push powers that are determined by ranking difference of selective compounds over multiple bioassays. Our experiments demonstrate that the dCPPP achieves an overall 19.22% improvement on prioritizing selective compounds over baseline models.

Code and data are available at here.

Drug Selection via Joint Push and Learning to Rank

Yicheng He , Junfeng Liu , and Xia Ning. Drug selection via joint push and learning to rank. IEEE Transactions on Computational Biology and Bioinformatics, 2018. in press. IF: 2.428

Abstract: Selecting the right drugs for the right patients is a primary goal of precision medicine. In this manuscript, we consider the problem of cancer drug selection in a learning-to-rank framework. We have formulated the cancer drug selection problem as to accurately predicting 1). the ranking positions of sensitive drugs and 2). the ranking orders among sensitive drugs in cancer cell lines based on their responses to cancer drugs. We have developed a new learning-to-rank method, denoted as pLETORg, that predicts drug ranking structures in each cell line via using drug latent vectors and cell line latent vectors. The pLETORg method learns such latent vectors through explicitly enforcing that, in the drug ranking list of each cell line, the sensitive drugs are pushed above insensitive drugs, and meanwhile the ranking orders among sensitive drugs are correct. Genomics information on cell lines is leveraged in learning the latent vectors. Our experimental results on a benchmark cell line-drug response dataset  demonstrate that the new pLETORg significantly outperforms the state-of-the-art method in prioritizing new sensitive drugs.

Code and data are available at here.