Anti-cancer drug discoveries have been serendipitous, we sought to present the Open Molecular Graph Learning Benchmark, named CandidateDrug4Cancer, a challenging and realistic benchmark dataset to facilitate scalable, robust, and reproducible graph machine learning research for anti-cancer drug discovery. CandidateDrug4Cancer dataset encompasses multiple most-mentioned 29 targets for cancer, covering 54869 cancer-related drug molecules which are ranged from pre-clinical, clinical and FDA-approved. Besides building the datasets, we also perform benchmark experiments with effective Drug Target Interaction (DTI) prediction baselines using descriptors and expressive graph neural networks. Experimental results suggest that CandidateDrug4Cancer presents significant challenges for learning molecular graphs and targets in practical application, indicating opportunities for future researches on developing candidate drugs for treating cancers.
翻译:抗癌药物发现是偶然的,我们试图提出名为“候选药物4癌症”的开放分子图学习基准,这是一个具有挑战性和现实的基准数据集,目的是促进为抗癌药物发现进行可扩展、稳健和可复制的图形机学习研究。“候选药物4癌症数据集”包含许多上述29个癌症目标,包括临床前、临床和林业发展局批准的54869个癌症相关药物分子。除了建立数据集外,我们还利用标本和直观图神经网络,用有效的药物目标互动(DTI)预测基线进行基准实验。实验结果表明,“候选药物4癌症”对实际应用中的分子图和目标的学习提出了重大挑战,表明今后研究开发治疗癌症的候选药物的机会。