String-based molecular representations play a crucial role in cheminformatics applications, and with the growing success of deep learning in chemistry, have been readily adopted into machine learning pipelines. However, traditional string-based representations such as SMILES are often prone to syntactic and semantic errors when produced by generative models. To address these problems, a novel representation, SELF-referencIng Embedded Strings (SELFIES), was proposed that is inherently 100% robust, alongside an accompanying open-source implementation. Since then, we have generalized SELFIES to support a wider range of molecules and semantic constraints and streamlined its underlying grammar. We have implemented this updated representation in subsequent versions of \selfieslib, where we have also made major advances with respect to design, efficiency, and supported features. Hence, we present the current status of \selfieslib (version 2.1.1) in this manuscript.
翻译:在化学学应用中,基于字符串的分子表示在化学学应用中发挥着关键作用,随着化学深层学习的日益成功,这些表示很容易被应用于机器学习管道中,然而,传统的基于字符串的表示,如SMILES,在由基因模型生成时,往往容易出现合成和语义错误。为了解决这些问题,有人提议采用一个新的表示方式,即SERF-Referencig 嵌入字符串(SELFIES),它本身具有100%的稳健性,同时伴之以开放源的实施。自那时以来,我们普及了SELFISES,以支持更广泛的分子和语义限制,并简化了它的基本语法。我们在以后版本的自定义中采用了这一更新的表述方式,我们在设计、效率和支持性特征方面也取得了重大进步。因此,我们在本手稿中展示了“自我校正”(2.1.1版)的现状。