Design, synthesis and experimental validation of novel potential chemopreventive agents using random forest and support vector machine binary classifiers
作者:Brienne Sprague、Qian Shi、Marlene T. Kim、Liying Zhang、Alexander Sedykh、Eiichiro Ichiishi、Harukuni Tokuda、Kuo-Hsiung Lee、Hao Zhu
DOI:10.1007/s10822-014-9748-9
日期:2014.6
Compared to the current knowledge on cancer chemotherapeutic agents, only limited information is available on the ability of organic compounds, such as drugs and/or natural products, to prevent or delay the onset of cancer. In order to evaluate chemical chemopreventive potentials and design novel chemopreventive agents with low to no toxicity, we developed predictive computational models for chemopreventive agents in this study. First, we curated a database containing over 400 organic compounds with known chemoprevention activities. Based on this database, various random forest and support vector machine binary classifiers were developed. All of the resulting models were validated by cross validation procedures. Then, the validated models were applied to virtually screen a chemical library containing around 23,000 natural products and derivatives. We selected a list of 148 novel chemopreventive compounds based on the consensus prediction of all validated models. We further analyzed the predicted active compounds by their ease of organic synthesis. Finally, 18 compounds were synthesized and experimentally validated for their chemopreventive activity. The experimental validation results paralleled the cross validation results, demonstrating the utility of the developed models. The predictive models developed in this study can be applied to virtually screen other chemical libraries to identify novel lead compounds for the chemoprevention of cancers.
与当前关于抗癌化疗药物的知识相比,关于有机化合物(如药物和/或天然产品)预防或延缓癌症发生的能力的信息非常有限。为了评估化学预防潜力并设计低毒至无毒的新型预防药物,本研究中我们开发了用于预防药物的预测计算模型。首先,我们构建了一个包含超过400种已知具有化学预防活性的有机化合物的数据库。基于该数据库,开发了多种随机森林和支持向量机二元分类器。所有得到的模型都通过交叉验证程序进行了验证。接着,将这些经过验证的模型应用于包含约23,000种天然产品及其衍生物的化学库进行虚拟筛选。根据所有经过验证模型的共识预测,我们筛选出了148种新型化学预防化合物。我们对这些预测具有活性的化合物进行了有机合成难易程度的进一步分析。最后,合成了18种化合物并实验验证了它们的化学预防活性。实验验证结果与交叉验证结果相符,证明了所开发模型的实用性。本研究中开发的预测模型可用于虚拟筛选其他化学库,以识别用于癌症化学预防的新型先导化合物。