Optimizing molecular design and discovering novel chemical structures to meet certain objectives, such as quantitative estimates of the drug-likeness score (QEDs), is NP-hard due to the vast combinatorial design space of discrete molecular structures, which makes it near impossible to explore the entire search space comprehensively to exploit de novo structures with properties of interest. To address this challenge, reducing the intractable search space into a lower-dimensional latent volume helps examine molecular candidates more feasibly via inverse design. Autoencoders are suitable deep learning techniques, equipped with an encoder that reduces the discrete molecular structure into a latent space and a decoder that inverts the search space back to the molecular design. The continuous property of the latent space, which characterizes the discrete chemical structures, provides a flexible representation for inverse design in order to discover novel molecules. However, exploring this latent space requires certain insights to generate new structures. We propose using a convex hall surrounding the top molecules in terms of high QEDs to ensnare a tight subspace in the latent representation as an efficient way to reveal novel molecules with high QEDs. We demonstrate the effectiveness of our suggested method by using the QM9 as a training dataset along with the Self- Referencing Embedded Strings (SELFIES) representation to calibrate the autoencoder in order to carry out the Inverse molecular design that leads to unfold novel chemical structure.
翻译:优化分子设计和发现新的化学结构以实现某些目标,例如对药物类比分(QEDs)的定量估计,由于离散分子结构的庞大组合设计空间,分子设计空间几乎不可能全面探索整个搜索空间,以全面利用具有相关属性的新结构。为了应对这一挑战,将棘手的搜索空间缩小为低维潜积体积,有助于通过反向设计对分子候选分子进行更易变的检查。自动编码器是合适的深层次学习技术,配有将离散分子结构降低到潜藏空间的编码器,以及将搜索空间反向分子设计的一个解码器。隐蔽空间的连续特性几乎无法全面探索整个搜索空间,从而利用离散化学结构的特性来全面探索新分子。然而,探索这一隐蔽空间需要一定的洞察力才能产生新的结构。我们提议使用高QED的螺旋门环环绕着顶部分子进入一个紧密的子空间,在潜伏层结构中使搜索空间反向分子结构转变。