Discrete latent variables are considered important for real world data, which has motivated research on Variational Autoencoders (VAEs) with discrete latents. However, standard VAE training is not possible in this case, which has motivated different strategies to manipulate discrete distributions in order to train discrete VAEs similarly to conventional ones. Here we ask if it is also possible to keep the discrete nature of the latents fully intact by applying a direct discrete optimization for the encoding model. The approach is consequently strongly diverting from standard VAE-training by sidestepping sampling approximation, reparameterization trick and amortization. Discrete optimization is realized in a variational setting using truncated posteriors in conjunction with evolutionary algorithms. For VAEs with binary latents, we (A) show how such a discrete variational method ties into gradient ascent for network weights, and (B) how the decoder is used to select latent states for training. Conventional amortized training is more efficient and applicable to large neural networks. However, using smaller networks, we here find direct discrete optimization to be efficiently scalable to hundreds of latents. More importantly, we find the effectiveness of direct optimization to be highly competitive in `zero-shot' learning. In contrast to large supervised networks, the here investigated VAEs can, e.g., denoise a single image without previous training on clean data and/or training on large image datasets. More generally, the studied approach shows that training of VAEs is indeed possible without sampling-based approximation and reparameterization, which may be interesting for the analysis of VAE-training in general. For `zero-shot' settings a direct optimization, furthermore, makes VAEs competitive where they have previously been outperformed by non-generative approaches.
翻译:离散潜变量被认为在现实世界数据中非常重要,这促使人们在具有离散潜变量的变分自编码器(VAE)上开展研究。然而,这种情况下标准VAE训练并不可行,这促使了不同策略来操纵离散分布,以便像传统VAEs一样训练离散VAEs。在这里,我们询问是否也可能通过应用针对编码模型的直接离散优化,将潜态的离散性质完全地保持不变。因此,该方法与标准VAE训练的做法截然不同,它避开了采样逼近、重参数化技巧和泛化的限制。变分设置中使用截断后验分布与进化算法实现离散优化。针对具有二进制潜在空间的VAEs,我们(A)展示了这种离散变分方法如何与网络权重的梯度上升相结合,以及(B)如何使用解码器来选择潜变量状态进行训练。尽管传统的泛化训练更加高效,并适用于大型神经网络,但使用较小的网络,我们发现直接离散优化对数百个潜变量的可扩展性非常有效。更重要的是,我们发现直接优化的有效性在“零样本”学习中非常有竞争力。与大型监督网络不同,这里研究的VAEs可以对单个图像进行去噪,而不需要之前对干净数据进行训练和/或在大型图像数据集上进行训练。更一般地说,所研究的方法表明,训练VAEs确实可以在没有基于采样的逼近和重参数化下进行,这可能对VAE训练的分析有所帮助。在“零样本”设置中,直接优化还使VAEs具有了一些在先前被非生成方法超越的竞争优势。