Molecular fingerprints are significant cheminformatics tools to map molecules into vectorial space according to their characteristics in diverse functional groups, atom sequences, and other topological structures. In this paper, we set out to investigate a novel molecular fingerprint \emph{Anonymous-FP} that possesses abundant perception about the underlying interactions shaped in small, medium, and large molecular scale links. In detail, the possible inherent atom chains are sampled from each molecule and are extended in a certain anonymous pattern. After that, the molecular fingerprint \emph{Anonymous-FP} is encoded in virtue of the Natural Language Processing technique \emph{PV-DBOW}. \emph{Anonymous-FP} is studied on molecular property identification and has shown valuable advantages such as rich information content, high experimental performance, and full structural significance. During the experimental verification, the scale of the atom chain or its anonymous manner matters significantly to the overall representation ability of \emph{Anonymous-FP}. Generally, the typical scale $r = 8$ enhances the performance on a series of real-world molecules, and specifically, the accuracy could level up to above $93\%$ on all NCI datasets.
翻译:分子指纹是一个重要的化学信息工具,用来根据分子在不同功能组、原子序列和其他地形结构中的特点绘制分子进入矢量空间的分子。 在本文中,我们着手调查一个新分子指纹\emph{匿名-FP},它对以小型、中型和大型分子规模链接的形式形成的内在相互作用有着丰富的认知。详细来说,可能固有的原子链系是从每个分子中抽样的,并且以某种匿名模式扩展。之后,分子指纹\emph{Anomous-FP}被根据自然语言处理技术(emph{PV-DBOW})编码。\emph{匿名-FP}在分子属性识别方面进行了研究,并显示出宝贵的优势,例如丰富的信息内容、高实验性能和完整的结构意义。在实验性核查期间,原子链的规模或匿名方式对于\emph{Anomous-FP}的总体代表能力非常重要。一般而言,典型的 $r=8$=8美元将提高真实世界系列数据的准确性,具体可以提高9-NC分子的精确度。