Multi-task learning for molecular property prediction is becoming increasingly important in drug discovery. However, in contrast to other domains, the performance of multi-task learning in drug discovery is still not satisfying as the number of labeled data for each task is too limited, which calls for additional data to complement the data scarcity. In this paper, we study multi-task learning for molecular property prediction in a novel setting, where a relation graph between tasks is available. We first construct a dataset (ChEMBL-STRING) including around 400 tasks as well as a task relation graph. Then to better utilize such relation graph, we propose a method called SGNN-EBM to systematically investigate the structured task modeling from two perspectives. (1) In the \emph{latent} space, we model the task representations by applying a state graph neural network (SGNN) on the relation graph. (2) In the \emph{output} space, we employ structured prediction with the energy-based model (EBM), which can be efficiently trained through noise-contrastive estimation (NCE) approach. Empirical results justify the effectiveness of SGNN-EBM. Code is available on https://github.com/chao1224/SGNN-EBM.
翻译:分子财产预测的多任务学习在药物发现中正变得日益重要。然而,与其他领域不同,在药物发现中多任务学习的绩效仍然不能令人满意,因为每项任务的标签数据数量太有限,这就要求增加数据以补充数据稀缺性。在本文中,我们研究在新奇环境中进行分子财产预测的多任务学习,在其中可以提供任务之间的关联图。我们首先用基于能源的模式(EBM)来构建一个数据集(CHEMBL-Strading),包括大约400项任务和任务关系图。然后,为了更好地利用这种关系图,我们建议一种称为SGNN-EBM的方法,从两个角度系统调查结构任务模型。 (1) 在\emph{latent}空间,我们通过在关系图上使用州图神经网络(SGNNNNNN)来模拟任务。 (2) 在任务中,我们使用基于能源的模式(EBM)进行结构化的预测,可以通过噪音调控估计(NCEEE)方法进行高效培训。Empricisalalalalalalal结果证明,可以使用MAGM/MMMM/CM.