在计算药物重新定位中自我监督的标签分级学习 (Self-supervised Learning for Label Sparsity in Computational Drug Repositioning)

The computational drug repositioning aims to discover new uses for marketed drugs, which can accelerate the drug development process and play an important role in the existing drug discovery system. However, the number of validated drug-disease associations is scarce compared to the number of drugs and diseases in the real world. Too few labeled samples will make the classification model unable to learn effective latent factors of drugs, resulting in poor generalization performance. In this work, we propose a multi-task self-supervised learning framework for computational drug repositioning. The framework tackles label sparsity by learning a better drug representation. Specifically, we take the drug-disease association prediction problem as the main task, and the auxiliary task is to use data augmentation strategies and contrast learning to mine the internal relationships of the original drug features, so as to automatically learn a better drug representation without supervised labels. And through joint training, it is ensured that the auxiliary task can improve the prediction accuracy of the main task. More precisely, the auxiliary task improves drug representation and serving as additional regularization to improve generalization. Furthermore, we design a multi-input decoding network to improve the reconstruction ability of the autoencoder model. We evaluate our model using three real-world datasets. The experimental results demonstrate the effectiveness of the multi-task self-supervised learning framework, and its predictive ability is superior to the state-of-the-art model.

翻译：计算药物重新定位的目的是发现市场药物的新用途,这可以加快药物开发过程,并在现有的药物发现系统中发挥重要作用。然而,与现实世界中的药物和疾病数量相比,经过验证的药物疾病协会数量很少。标签样本太少,使得分类模式无法了解药物的有效潜在因素,导致普遍化表现不佳。在这项工作中,我们提议了一个计算药物重新定位的多任务自我监督学习框架。框架通过学习更好的药物代表形式解决标签的偏僻性。具体地说,我们把药物-疾病关联预测问题作为主要任务,辅助任务是使用数据增强战略,并对比原有药物特征的内部关系,以便自动学习更好的药物代表,而无需监督标签。通过联合培训,我们确保辅助任务能够提高主要任务的预测准确性。更准确地说,辅助性任务提高了药物代表性,并成为提高普遍化的进一步规范。此外,我们设计了一个多投入的模型化模型化网络,以提高原始药物特征的内部关系,从而对比原始药物特征的内部关系,从而自动地学习更好的药物代表性。通过联合培训,确保辅助任务能够提高主要任务的预测性。更准确地说,辅助性任务改进药物代表性,作为改进药物代表性的规范化。此外,我们设计一个多输入模型的模型网络,我们用高级模型来评估其高级的实验性模型,以提升其自我分析模型展示模型,展示的模型的模型,以显示其自我变化的模型。