Predicting the binding sites of the target proteins plays a fundamental role in drug discovery. Most existing deep-learning methods consider a protein as a 3D image by spatially clustering its atoms into voxels and then feed the voxelized protein into a 3D CNN for prediction. However, the CNN-based methods encounter several critical issues: 1) defective in representing irregular protein structures; 2) sensitive to rotations; 3) insufficient to characterize the protein surface; 4) unaware of data distribution shift. To address the above issues, this work proposes EquiPocket, an E(3)-equivariant Graph Neural Network (GNN) for binding site prediction. In particular, EquiPocket consists of three modules: the first one to extract local geometric information for each surface atom, the second one to model both the chemical and spatial structure of the protein, and the last one to capture the geometry of the surface via equivariant message passing over the surface atoms. We further propose a dense attention output layer to better alleviate the data distribution shift effect incurred by the variable protein size. Extensive experiments on several representative benchmarks demonstrate the superiority of our framework to the state-of-the-art methods.
翻译:预测目标蛋白质的绑定点在药物发现中起着根本作用。大多数现有的深层学习方法将蛋白质视为3D图像,将其原子在空间上组合成氧化物,然后将氧化蛋白输入到3DCNN,以进行预测。然而,基于CNN的方法遇到了几个关键问题:(1) 代表非正常蛋白结构的缺陷;(2) 对旋转敏感;(3) 不足以描述蛋白表面的特点;(4) 不了解数据分布的变化。为了解决上述问题,这项工作提议EquiPocket(EqiPocket),即E(3)等质图形神经网络(GNNN)进行捆绑定点预测。特别是,EquiPocket由三个模块组成:第一个模块为每个表面原子提取当地几何信息,第二个模块为蛋白质的化学和空间结构建模,最后一个模块为通过传过表面原子的静态信息测量表面的地貌。我们进一步提议一个密集的注意力输出层,以更好地减轻因可变蛋白质大小而产生的数据分布变化效应。在几个有代表性的基准上进行广泛的实验,展示我们框架的优越性方法。