DDDA:提高药物-目标接近性预测模型的通用性 (DebiasedDTA: Improving the Generalizability of Drug-Target Affinity Prediction Models)

Motivation: Computational models that accurately predict the binding affinity of an input protein-chemical pair can accelerate drug discovery studies. These models are trained on available protein-chemical interaction datasets, which may contain dataset biases that lead the model to learn dataset-specific patterns, instead of generalizable relationships. As a result, the prediction performance of models drops for previously unseen or novel biomolecules. Here, we present DebiasedDTA, a novel drug-target affinity (DTA) prediction model training framework that addresses dataset biases to improve affinity prediction for novel biomolecules. DebiasedDTA reweights the training samples to mitigate the effect of dataset biases and is applicable to most DTA prediction models. Results: The results show that DebiasedDTA can improve the prediction performance on the interactions between previously unseen molecules. In addition, affinity prediction for previously encountered biomolecules also improves with debiasing. The experiments also show that DebiasedDTA can augment DTA prediction models of different input and model structures and is able to mitigate the effect of various dataset biases. Detailed analysis of the predictions shows that the proposed framework can also help to tackle the problem of insufficient learning from proteins, a problem that is known to be a barrier to achieve generalizable DTA prediction models. Availability and Implementation: The source code, the models, and the datasets for reproduction are freely available for download at https://github.com/boun-tabi/debiaseddta-reproduce, implementation in Python3, and supported for Linux, MacOS and MS Windows. Contact: arzucan.ozgur@boun.edu.tr, elif.ozkirimli@roche.com

翻译：动力: 精确预测输入的蛋白质- 化学配对的紧贴性的计算模型可以加速药物发现研究。这些模型在现有的蛋白质- 化学互动数据集上接受培训, 其中可能包含数据集偏差, 导致模型学习数据集特定模式, 而不是可概括的关系。结果显示, 模型的预测性能会改善先前看不见或新颖的生物分子之间的预测性能。这里, 我们展示了一种新颖的药物目标亲近性( DTA) 预测模型培训框架, 解决数据集偏差, 以改善新生物分子的亲近性预测。 DTDA 将测试样本重新加权, 以减轻数据集偏差的偏差效果, 并适用于大多数 DTA 预测模型。结果表明, DTA 的偏差能改善先前看不见的分子之间的相互作用的预测性能。此外, 对以前遇到的生物分子的亲近性预测性能也随着可调和可变性的模型的改善。实验还显示, DBA 可以在不同的输入和模型结构中加强DTA的预测模型, 也能够减轻数据流质分析。

相关内容

MoDELS

关注 44

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日