Motivation: Computational models that accurately predict the binding affinity of an input protein-chemical pair can accelerate drug discovery studies. These models are trained on available protein-chemical interaction datasets, which may contain dataset biases that lead the model to learn dataset-specific patterns, instead of generalizable relationships. As a result, the prediction performance of models drops for previously unseen or novel biomolecules. Here, we present DebiasedDTA, a novel drug-target affinity (DTA) prediction model training framework that addresses dataset biases to improve affinity prediction for novel biomolecules. DebiasedDTA reweights the training samples to mitigate the effect of dataset biases and is applicable to most DTA prediction models. Results: The results show that DebiasedDTA can improve the prediction performance on the interactions between previously unseen molecules. In addition, affinity prediction for previously encountered biomolecules also improves with debiasing. The experiments also show that DebiasedDTA can augment DTA prediction models of different input and model structures and is able to mitigate the effect of various dataset biases. Detailed analysis of the predictions shows that the proposed framework can also help to tackle the problem of insufficient learning from proteins, a problem that is known to be a barrier to achieve generalizable DTA prediction models. Availability and Implementation: The source code, the models, and the datasets for reproduction are freely available for download at https://github.com/boun-tabi/debiaseddta-reproduce, implementation in Python3, and supported for Linux, MacOS and MS Windows. Contact: arzucan.ozgur@boun.edu.tr, elif.ozkirimli@roche.com
翻译:动力: 精确预测输入的蛋白质- 化学配对的紧贴性的计算模型可以加速药物发现研究。 这些模型在现有的蛋白质- 化学互动数据集上接受培训, 其中可能包含数据集偏差, 导致模型学习数据集特定模式, 而不是可概括的关系。 结果显示, 模型的预测性能会改善先前看不见或新颖的生物分子之间的预测性能。 这里, 我们展示了一种新颖的药物目标亲近性( DTA) 预测模型培训框架, 解决数据集偏差, 以改善新生物分子的亲近性预测。 DTDA 将测试样本重新加权, 以减轻数据集偏差的偏差效果, 并适用于大多数 DTA 预测模型。 结果表明, DTA 的偏差能改善先前看不见的分子之间的相互作用的预测性能。 此外, 对以前遇到的生物分子的亲近性预测性能也随着可调和可变性的模型的改善。 实验还显示, DBA 可以在不同的输入和模型结构中加强DTA的预测模型, 也能够减轻数据流质分析。