Nigam et al. reported a genetic algorithm (GA) utilizing the SELFIES representation and also propose an adaptive, neural network-based, penalty that is supposed to improve the diversity of the generated molecules. The main claims of the paper are that this GA outperforms other generative techniques (as measured by the penalized logP) and that a neural network-based adaptive penalty increases the diversity of the generated molecules. In this work, we investigated the reproducibility of their claims. Overall, we were able to reproduce comparable results using the SELFIES-based GA, but mostly by exploiting deficiencies of the (easily optimizable) fitness function (i.e., generating long, sulfur containing, chains). In addition, we also reproduce that the discriminator can be used to bias the generation of molecules to ones that are similar to the reference set. In addition, we also attempted to quantify the evolution of the diversity, understand the influence of some hyperparameters, and propose improvements to the adaptive penalty.
翻译:Nigam等人报告了利用SELFIES代表的遗传算法(GA),并提出了一种适应性、神经网络为基础的惩罚,旨在改进所产分子的多样性。本文的主要主张是,GA优于其他基因技术(按受罚日志衡量),以神经网络为基础的适应性惩罚增加了所产分子的多样性。在这项工作中,我们调查了这些分子的主张的可复制性。总的来说,我们利用以SELFIES为基础的GA复制了可比结果,但主要通过利用(容易优化的)健身功能的缺陷(即产生长的、含有硫磺的、链条的)来复制。此外,我们还重复说,歧视者可以用来将分子的生成偏向于与所设定的参考值相似的分子。此外,我们还试图量化多样性的演变,了解一些超参数的影响,并提议改进适应性惩罚。