Procuring expressive molecular representations underpins AI-driven molecule design and scientific discovery. The research to date mainly focuses on atom-level homogeneous molecular graphs, ignoring the rich information in subgraphs or motifs. As for 3D structures, previous studies fail to efficiently capture long-range dependencies nor consider the non-uniformity of interatomic distances. To address such issues, we formulate heterogeneous molecular graphs, and introduce Molformer to exploit both molecular motifs and 3D geometry. Specifically, we extract motifs based on functional groups for small molecules and use reinforcement learning for proteins respectively, and construct heterogeneous molecular graphs composed of both atom-level and motif-level nodes. To utilize 3D spatial information, Molformer adopts a roto-translation invariant convolutional position encoding. It is coupled with a multi-scale self-attention mechanism to capture local fine-grained patterns with increasing contextual scales, and an attentive farthest point sampling algorithm to obtain the molecular representations. We validate Molformer across a few domains including quantum chemistry, physiology, and biophysics. Experiments show that Molformer outperforms state-of-the-art baselines. Our work provides a promising way to utilize informative motifs and amalgamate 3D geometric information.
翻译:迄今为止的研究主要侧重于原子级同质分子图,忽略子体或motif级节点的丰富信息。关于3D结构,以前的研究未能有效捕捉长距离依赖性,也没有考虑到内相距离不统一的问题。为了解决这些问题,我们制作了异质分子图,并引入了分子图,以利用分子motifs和3D几何方法。具体地说,我们根据小分子的功能组提取模型,并分别使用蛋白质强化学习,并构建由原子级和motif级节点组成的不同分子图。为了利用3D空间信息,Molder采用一个变形变异电动位置变异调调调调调调调调调。与此同时,我们制作了一个多尺度的自我保存机制,以日益扩大的背景尺度来捕捉本地微缩图型模式,以及获取分子图解的深处点抽样算法。我们验证了几个领域包括量子化学、物理、生物物理和生物物理模型的变现方法,从而展示了我们的模型模型和模型。