Activity cliffs (ACs), which are generally defined as pairs of structurally similar molecules that are active against the same bio-target but significantly different in the binding potency, are of great importance to drug discovery. Up to date, the AC prediction problem, i.e., to predict whether a pair of molecules exhibit the AC relationship, has not yet been fully explored. In this paper, we first introduce ACNet, a large-scale dataset for AC prediction. ACNet curates over 400K Matched Molecular Pairs (MMPs) against 190 targets, including over 20K MMP-cliffs and 380K non-AC MMPs, and provides five subsets for model development and evaluation. Then, we propose a baseline framework to benchmark the predictive performance of molecular representations encoded by deep neural networks for AC prediction, and 16 models are evaluated in experiments. Our experimental results show that deep learning models can achieve good performance when the models are trained on tasks with adequate amount of data, while the imbalanced, low-data and out-of-distribution features of the ACNet dataset still make it challenging for deep neural networks to cope with. In addition, the traditional ECFP method shows a natural advantage on MMP-cliff prediction, and outperforms other deep learning models on most of the data subsets. To the best of our knowledge, our work constructs the first large-scale dataset for AC prediction, which may stimulate the study of AC prediction models and prompt further breakthroughs in AI-aided drug discovery. The codes and dataset can be accessed by https://drugai.github.io/ACNet/.
翻译:活动悬崖(ACs)通常被定义为对同一生物目标活跃的、但在约束力方面有很大不同的结构上类似的分子(MMPs),对药物发现非常重要。到目前为止,ACs预测问题,即预测一对分子是否展示了ACs关系,尚未得到充分探讨。在本文件中,我们首先引入了ACNet,这是ACs预测的大规模数据集。ACNet 将400多对结构上类似的分子(MMPs)与190个目标(包括20K MMP-cliffs和380K非AC的预测MMPs)相匹配,对药物发现非常重要。迄今为止,ACsss预测问题(即预测一对一对一对一对一对一的分子显示AC公司关系,对一对一对一的分子表现进行预测,我们提出基准框架框架框架。 我们的实验结果显示,当模型经过关于足够数量的数据的首次培训,模型能够取得良好的业绩,而ACNCsermacretarial 数据网络的不平衡、低数据和超出分配的特性特性,然后,AECFS-FS-Cs millal 数据模型中最具有挑战性的数据模型在深度的模型上进行更深的精确的精确的升级的改进。