The discovery of drug-target interactions (DTIs) is a very promising area of research with great potential. The accurate identification of reliable interactions among drugs and proteins via computational methods, which typically leverage heterogeneous information retrieved from diverse data sources, can boost the development of effective pharmaceuticals. Although random walk and matrix factorization techniques are widely used in DTI prediction, they have several limitations. Random walk-based embedding generation is usually conducted in an unsupervised manner, while the linear similarity combination in matrix factorization distorts individual insights offered by different views. To tackle these issues, we take a multi-layered network approach to handle diverse drug and target similarities, and propose a novel optimization framework, called Multiple similarity DeepWalk-based Matrix Factorization (MDMF), for DTI prediction. The framework unifies embedding generation and interaction prediction, learning vector representations of drugs and targets that not only retain higher-order proximity across all hyper-layers and layer-specific local invariance, but also approximate the interactions with their inner product. Furthermore, we develop an ensemble method (MDMF2A) that integrates two instantiations of the MDMF model, optimizing the area under the precision-recall curve (AUPR) and the area under the receiver operating characteristic curve (AUC) respectively. The empirical study on real-world DTI datasets shows that our method achieves statistically significant improvement over current state-of-the-art approaches in four different settings. Moreover, the validation of highly ranked non-interacting pairs also demonstrates the potential of MDMF2A to discover novel DTIs.
翻译:发现药物目标相互作用(DTI)是一个很有潜力的非常有希望的研究领域。通过计算方法准确确定药物和蛋白质之间的可靠互动,通常利用从不同数据来源检索的多种信息,可以推动有效药品的发展。尽管随机步行和矩阵乘数化技术在DTI预测中广泛使用,但它们有若干限制。随机步行嵌入生成通常以不受监督的方式进行,而矩阵因子化中的线性相似性组合扭曲了不同观点提供的个体见解。为了解决这些问题,我们采取了多层次网络方法处理不同的药物和目标相似之处,并提出新的优化框架,称为多类相似性深电离基矩阵乘数(DMMF),用于DTI的预测。这个框架统一了嵌入和互动预测、学习药物和具体目标的矢量表达方式,不仅在所有超层和分层的局部差异性地方上保持较高的距离,而且还接近与内部产品的互动。此外,我们开发了一套包含当前不同药物和目标的多层次网络化方法(MDMF2A),并提出了一个新的优化框架中两个不甚相近的中间级的模型。