Evaluating performance of chemical fingerprinting methods and machine learning algorithms for in silico prediction of Ames mutagenicity
The Office of Food Additive Safety (OFAS) at U.S. FDA’s Center for Food Safety and Applied Nutrition is responsible for ensuring the safety of all food additives used in the United States. Current research efforts at OFAS focus on building in-house mutagenicity and carcinogenicity predictive models with high prediction accuracy for food related chemicals. In this research, we present an evaluation of different chemical fingerprinting methods and machine learning algorithms available in the public domain and compare their performance for in silico prediction of Ames mutagenicity. We evaluated six fingerprinting methods; MACCS keys, RDKit, Circular, ToxPrint, PubChem, and Atom pairs, and six machine learning algorithms; k-Nearest Neighbors, Decision Trees, Random Forest, Artificial Neural Networks, Support Vector Machines, and Naïve Bayes. QSAR models were developed using all combinations of fingerprints and machine learning algorithms and performance metrics were calculated using the Hansen benchmark dataset. Combinations of knowledge-enriched fingerprints and deep learning algorithms were found to give the best performing models for Ames mutagenicity. Some of these models were then evaluated against empirical data on food related chemicals in the OFAS food additive knowledgebase called CERES. These models were found to give good overall predictive performance and high accuracy in predicting the non-mutagenic compounds. More research is needed to improve the prediction of mutagenic compounds in the food related chemical space.