Properties of molecules are indicative of their functions and thus are useful in many applications. With the advances of deep learning methods, computational approaches for predicting molecular properties are gaining increasing momentum. However, there lacks customized and advanced methods and comprehensive tools for this task currently. Here we develop a suite of comprehensive machine learning methods and tools spanning different computational models, molecular representations, and loss functions for molecular property prediction and drug discovery. Specifically, we represent molecules as both graphs and sequences. Built on these representations, we develop novel deep models for learning from molecular graphs and sequences. In order to learn effectively from highly imbalanced datasets, we develop advanced loss functions that optimize areas under precision-recall curves. Altogether, our work not only serves as a comprehensive tool, but also contributes towards developing novel and advanced graph and sequence learning methodologies. Results on both online and offline antibiotics discovery and molecular property prediction tasks show that our methods achieve consistent improvements over prior methods. In particular, our methods achieve #1 ranking in terms of both ROC-AUC and PRC-AUC on the AI Cures Open Challenge for drug discovery related to COVID-19. Our software is released as part of the MoleculeX library under AdvProp.
翻译:分子特性的特性表明分子的功能,因此在许多应用中有用。随着深层次学习方法的进步,预测分子特性的计算方法正在增加势头。然而,目前缺乏专门和先进的方法和全面工具来完成这项任务。在这里,我们开发了一套全面的机器学习方法和工具,涵盖不同的计算模型、分子表示以及分子特性预测和药物发现的损失功能。具体地说,我们以图表和序列两种形式代表分子。在这些表达方式的基础上,我们开发了从分子图和序列中学习的新颖的深层次模型。为了从高度不平衡的数据集中有效地学习,我们开发了先进的损失功能,优化了精确召回曲线下的区域。总的来说,我们的工作不仅是一个综合工具,而且还有助于开发新的和先进的图表和序列学习方法。关于在线和离线抗生素发现和分子特性预测任务的结果表明,我们的方法比以往的方法都得到了一致的改进。我们的方法在ROC-AUC和PRC-AUC中取得了第1级的排名。在AI Cures Op-Pro Chregrequest 用于药物发现软件的COVIVLADADADADLULULULULAD下,我们的方法在ADADADADADADULULULULULULIOLDADADFIOLDIOLLLLIOLSD的软件中获得了了第一部分。