Proteins perform a large variety of functions in living organisms, thus playing a key role in biology. As of now, available learning algorithms to process protein data do not consider several particularities of such data and/or do not scale well for large protein conformations. To fill this gap, we propose two new learning operations enabling deep 3D analysis of large-scale protein data. First, we introduce a novel convolution operator which considers both, the intrinsic (invariant under protein folding) as well as extrinsic (invariant under bonding) structure, by using $n$-D convolutions defined on both the Euclidean distance, as well as multiple geodesic distances between atoms in a multi-graph. Second, we enable a multi-scale protein analysis by introducing hierarchical pooling operators, exploiting the fact that proteins are a recombination of a finite set of amino acids, which can be pooled using shared pooling matrices. Lastly, we evaluate the accuracy of our algorithms on several large-scale data sets for common protein analysis tasks, where we outperform state-of-the-art methods.
翻译:蛋白质在活生物体中发挥着多种功能,从而在生物学中发挥着关键作用。目前,处理蛋白质数据的现有学习算法没有考虑到这些数据的几种特性,也没有考虑到大型蛋白质分解的大小。为填补这一空白,我们提议了两项新的学习操作,以便能够对大规模蛋白数据进行深度的3D分析。首先,我们引入了一个新的合成操作器,它既考虑到内在的(蛋白质折叠下的变量)内在的(蛋白质折叠下),也考虑到外源(结合结构下的变量)结构,利用欧洲-加勒比距离上界定的美元-D Convolutions,以及多种原子之间的多重大地测量距离。第二,我们通过引入分级集合操作器,利用蛋白质是一组有限氨酸的再组合,利用共用的组合矩阵,进行多层蛋白质分析,从而实现多尺度的蛋白质分析,我们在这方面超越了先进的状态方法。最后,我们评估了我们用于共同蛋白质分析任务的若干大系列数据集的算法的准确性。