Predicting drug-target interactions (DTI) via reliable computational methods is an effective and efficient way to mitigate the enormous costs and time of the drug discovery process. Structure-based drug similarities and sequence-based target protein similarities are the commonly used information for DTI prediction. Among numerous computational methods, neighborhood-based chemogenomic approaches that leverage drug and target similarities to perform predictions directly are simple but promising ones. However, most existing similarity-based methods follow the transductive setting. These methods cannot directly generalize to unseen data because they should be re-built to predict the interactions for new arriving drugs, targets, or drug-target pairs. Besides, many similarity-based methods, especially neighborhood-based ones, cannot handle directly all three types of interaction prediction. Furthermore, a large amount of missing interactions in current DTI datasets hinders most DTI prediction methods. To address these issues, we propose a new method denoted as Weighted k Nearest Neighbor with Interaction Recovery (WkNNIR). Not only can WkNNIR estimate interactions of any new drugs and/or new targets without any need of re-training, but it can also recover missing interactions. In addition, WkNNIR exploits local imbalance to promote the influence of more reliable similarities on the DTI prediction process. We also propose a series of ensemble methods that employ diverse sampling strategies and could be coupled with WkNNIR as well as any other DTI prediction method to improve performance. Experimental results over five benchmark datasets demonstrate the effectiveness of our approaches in predicting drug-target interactions. Lastly, we confirm the practical prediction ability of proposed methods to discover reliable interactions that not reported in the original benchmark datasets.
翻译:通过可靠的计算方法预测药物-目标相互作用(DTI)是减少药物发现过程的巨大成本和时间的一个有效而高效的方法。基于结构的药物相似性和基于序列的目标蛋白质相似性是用于DTI预测的常用信息。在许多计算方法中,以街区为基础的化学化学组学方法,利用药物和将相似性作为直接进行预测的目标都是简单但有希望的方法。然而,大多数以类似性为基础的现有方法都遵循了传输设置。这些方法不能直接概括为无法见的数据,因为它们应该重新建立,以预测新到来的药物、目标或药物-目标配对的相互作用。此外,许多基于结构的药物相似性和基于序列的目标蛋白质相似性蛋白质相似性蛋白质相似性蛋白质相似性比,特别是以街区为基础的方法,无法直接处理所有三种类型的互动预测。此外,目前DTI数据集中大量缺失的相互作用妨碍了大多数DTI的预测方法。为了解决这些问题,我们提出了一种新的方法,即Westerest Neghble 和互动(WkNNNNIR) 的原始方法。我们不仅能够评估任何新药物和/或新目标的相互作用的多样化方法,而且我们也可以在重新分析中评估中可以评估新的方法的精确的精确的预测方法。我们也可以地评估。