Accurate ADMET (an abbreviation for "absorption, distribution, metabolism, excretion, and toxicity") predictions can efficiently screen out undesirable drug candidates in the early stage of drug discovery. In recent years, multiple comprehensive ADMET systems that adopt advanced machine learning models have been developed, providing services to estimate multiple endpoints. However, those ADMET systems usually suffer from weak extrapolation ability. First, due to the lack of labelled data for each endpoint, typical machine learning models perform frail for the molecules with unobserved scaffolds. Second, most systems only provide fixed built-in endpoints and cannot be customised to satisfy various research requirements. To this end, we develop a robust and endpoint extensible ADMET system, HelixADMET (H-ADMET). H-ADMET incorporates the concept of self-supervised learning to produce a robust pre-trained model. The model is then fine-tuned with a multi-task and multi-stage framework to transfer knowledge between ADMET endpoints, auxiliary tasks, and self-supervised tasks. Our results demonstrate that H-ADMET achieves an overall improvement of 4%, compared with existing ADMET systems on comparable endpoints. Additionally, the pre-trained model provided by H-ADMET can be fine-tuned to generate new and customised ADMET endpoints, meeting various demands of drug research and development requirements.
翻译:准确的ADMET(“吸附、分配、新陈代谢、排泄和毒性”的缩略语)预测可以在药物发现早期阶段有效筛选不良药物候选者。近年来,开发了采用先进机器学习模型的多种综合ADMET系统,为估算多端点提供了服务。然而,这些ADMET系统通常具有薄弱的外推能力。第一,由于缺乏每个端点的标签数据,典型的机器学习模型对使用未观测的脚架分子的分子来说是脆弱的。第二,大多数系统仅提供固定的固定的固定的固定端点,无法定制以满足各种研究要求。为此,我们开发了一个坚固和可端点的ADMED系统,HliixADMED(H-ADMED)。H-ADMET(H-ADMET)吸收了自我监督的学习概念,以产生一个稳健的预先训练模型。然后,该模型与多任务和多阶段框架进行调整,以便在ADMED的终端点、辅助任务和自我监督的自我监督任务之间转让知识。我们的成果可以用来满足各种研究要求。我们的成果能够通过HMMEDADADDADADD的升级的升级的升级的系统实现现有最终要求。通过HADADDADDADDDDADDDDDD的升级的升级的升级到现有最后要求。