Genes are fundamental for analyzing biological systems and many recent works proposed to utilize gene expression for various biological tasks by deep learning models. Despite their promising performance, it is hard for deep neural networks to provide biological insights for humans due to their black-box nature. Recently, some works integrated biological knowledge with neural networks to improve the transparency and performance of their models. However, these methods can only incorporate partial biological knowledge, leading to suboptimal performance. In this paper, we propose the Biological Factor Regulatory Neural Network (BFReg-NN), a generic framework to model relations among biological factors in cell systems. BFReg-NN starts from gene expression data and is capable of merging most existing biological knowledge into the model, including the regulatory relations among genes or proteins (e.g., gene regulatory networks (GRN), protein-protein interaction networks (PPI)) and the hierarchical relations among genes, proteins and pathways (e.g., several genes/proteins are contained in a pathway). Moreover, BFReg-NN also has the ability to provide new biologically meaningful insights because of its white-box characteristics. Experimental results on different gene expression-based tasks verify the superiority of BFReg-NN compared with baselines. Our case studies also show that the key insights found by BFReg-NN are consistent with the biological literature.
翻译:基因在分析生物系统方面具有基础性作用,许多最近的研究提出利用深度学习模型对基因表达进行各种生物任务。尽管其表现非常有前途,但由于其黑匣子特性,深度神经网络很难为人类提供生物洞见。最近,一些工作将生物知识与神经网络相结合,以改善模型的透明性和性能。然而,这些方法只能包含部分的生物知识,导致表现亚优化。在本文中,我们提出了生物因素调控神经网络(BFReg-NN),这是一种泛用框架,用于模拟细胞系统中生物因素之间的关系。BFReg-NN从基因表达数据开始,能够将大多数现有的生物知识结合到模型中,包括基因或蛋白质之间的调节关系(例如基因调控网络(GRN),蛋白质-蛋白质互作网络(PPI))以及基因、蛋白质和信号通路之间的层级关系(例如,几个基因/蛋白质包含在一个信号通路中)。此外,由于其白箱特性,BFReg-NN还具有提供新的具有生物意义的见解的能力。不同基于基因表达的任务的实验结果验证了BFReg-NN相比基准的优越性。我们的案例研究还表明,BFReg-NN发现的关键见解与生物文献一致。