Optimization in the latent space of variational autoencoders is a promising approach to generate high-dimensional discrete objects that maximize an expensive black-box property (e.g., drug-likeness in molecular generation, function approximation with arithmetic expressions). However, existing methods lack robustness as they may decide to explore areas of the latent space for which no data was available during training and where the decoder can be unreliable, leading to the generation of unrealistic or invalid objects. We propose to leverage the epistemic uncertainty of the decoder to guide the optimization process. This is not trivial though, as a naive estimation of uncertainty in the high-dimensional and structured settings we consider would result in high estimator variance. To solve this problem, we introduce an importance sampling-based estimator that provides more robust estimates of epistemic uncertainty. Our uncertainty-guided optimization approach does not require modifications of the model architecture nor the training process. It produces samples with a better trade-off between black-box objective and validity of the generated samples, sometimes improving both simultaneously. We illustrate these advantages across several experimental settings in digit generation, arithmetic expression approximation and molecule generation for drug design.
翻译:优化变异自动电解码器的潜在空间是一种很有希望的方法,可以生成高维离散物体,使昂贵的黑盒属性最大化(例如分子生成中的药物相似性,功能近似于算术表达式)。然而,现有方法缺乏稳健性,因为它们可能决定探索潜在空间中那些在培训期间没有数据、解码器不可靠、导致产生不切实际或无效天体的地区。我们提议利用解码器的集合性不确定性来引导优化进程。但这不是微不足道的,因为对高维和结构化环境中的不确定性的天真估计将导致高估计值差异。为解决这一问题,我们引入了一个重要的基于取样的估测算器,该测算器提供了更精确的测算误差。我们的不确定性指导优化方法并不要求修改模型结构或培训过程。它生成样本时,黑盒目标与生成的样品的正确性之间有更好的交替关系,有时同时改进两者。我们用数字生成、算术表达式和分子生成药物设计中的若干实验性环境展示了这些优势。