In this paper, we investigate the problem of string-based molecular generation via variational autoencoders (VAEs) that have served a popular generative approach for various tasks in artificial intelligence. We propose a simple, yet effective idea to improve the performance of VAE for the task. Our main idea is to maintain multiple decoders while sharing a single encoder, i.e., it is a type of ensemble techniques. Here, we first found that training each decoder independently may not be effective as the bias of the ensemble decoder increases severely under its auto-regressive inference. To maintain both small bias and variance of the ensemble model, our proposed technique is two-fold: (a) a different latent variable is sampled for each decoder (from estimated mean and variance offered by the shared encoder) to encourage diverse characteristics of decoders and (b) a collaborative loss is used during training to control the aggregated quality of decoders using different latent variables. In our experiments, the proposed VAE model particularly performs well for generating a sample from out-of-domain distribution.
翻译:在本文中,我们调查了通过可变自动解密器(VAEs)进行以弦为基础的分子生成的问题,这些解密器对人工智能中的各种任务都起到了一种流行的基因化作用。我们提出了一个简单而有效的想法来改进VAE对这项任务的性能。我们的主要想法是保持多个解码器,同时共用一个编码器,即它是一种混合技术。在这里,我们首先发现,由于共振解码器的偏向在自动递增性下严重增加,因此,独立培训每个解密器可能不起作用。为了保持共振模型的微小偏差和差异,我们提议的技术有双重:(a)对每个解码器(根据共享编码器提供的估计平均值和差异)进行不同的潜在变量取样,以便鼓励解码器的不同特性;(b)在培训期间使用合作损失来控制不同潜在变量的解码器综合质量。在我们的实验中,拟议的VAE模型特别出色地表现了从外层分布中生成样本。