Machine learning, and representation learning in particular, has the potential to facilitate drug discovery by screening billions of compounds. For example, a successful approach is representing the molecules as a graph and utilizing graph neural networks (GNN). Yet, these approaches still require experimental measurements of thousands of compounds to construct a proper training set. While in some domains it is easier to acquire experimental data, in others it might be more limited. For example, it is easier to test the compounds on bacteria than perform in-vivo experiments. Thus, a key question is how to utilize information from a large available dataset together with a small subset of compounds where both domains are measured to predict compounds' effect on the second, experimentally less available domain. Current transfer learning approaches for drug discovery, including training of pre-trained modules or meta-learning, have limited success. In this work, we develop a novel method, named Symbiotic Message Passing Neural Network (SMPNN), for merging graph-neural-network models from different domains. Using routing new message passing lanes between them, our approach resolves some of the potential conflicts between the different domains, and implicit constraints induced by the larger datasets. By collecting public data and performing additional high-throughput experiments, we demonstrate the advantage of our approach by predicting anti-fungal activity from anti-bacterial activity. We compare our method to the standard transfer learning approach and show that SMPNN provided better and less variable performances. Our approach is general and can be used to facilitate information transfer between any two domains such as different organisms, different organelles, or different environments.
翻译:机器学习,尤其是表示学习,有潜力通过筛选数十亿化合物来促进药物发现。例如,成功的方法是将分子表示为图形,并利用图形神经网络(GNN)。然而,这些方法仍然需要实验测量数千种化合物以构建适当的训练集。虽然在某些领域获得实验数据更容易,但在其他领域实验数据可能更有限。例如,在细菌上测试化合物比进行体内实验更容易。因此,一个关键问题是如何利用来自大量可用数据集以及测量两个领域的小型化合物子集的信息来预测化合物对第二个实验性较少的领域的影响。目前用于药物发现的转移学习方法,包括预训练模块的训练或元学习,取得了有限的成功。在本工作中,我们开发了一种新的方法,命名为共生信息传递神经网络(SMPNN),用于合并不同领域的图神经网络模型。通过在它们之间路由新的消息传递通道,我们的方法解决了不同领域之间的一些潜在冲突,以及更大数据集造成的隐式约束。通过收集公共数据并执行额外的高通量实验,我们展示了我们的方法的优势,通过抗细菌作用预测抗真菌作用。我们将我们的方法与标准转移学习方法进行比较,并表明SMPNN提供了更好和更少的变异性表现。我们的方法是通用的,可以用于促进任何两个领域之间的信息传递,例如不同的生物体,不同的细胞器或不同的环境。