We develop a novel method for carrying out model selection for Bayesian autoencoders (BAEs) by means of prior hyper-parameter optimization. Inspired by the common practice of type-II maximum likelihood optimization and its equivalence to Kullback-Leibler divergence minimization, we propose to optimize the distributional sliced-Wasserstein distance (DSWD) between the output of the autoencoder and the empirical data distribution. The advantages of this formulation are that we can estimate the DSWD based on samples and handle high-dimensional problems. We carry out posterior estimation of the BAE parameters via stochastic gradient Hamiltonian Monte Carlo and turn our BAE into a generative model by fitting a flexible Dirichlet mixture model in the latent space. Consequently, we obtain a powerful alternative to variational autoencoders, which are the preferred choice in modern applications of autoencoders for representation learning with uncertainty. We evaluate our approach qualitatively and quantitatively using a vast experimental campaign on a number of unsupervised learning tasks and show that, in small-data regimes where priors matter, our approach provides state-of-the-art results, outperforming multiple competitive baselines.
翻译:我们开发了一种新颖的方法,通过先前的超参数优化,为巴伊西亚自动编码器(BAEs)进行模型选择。在二类最大可能性优化的常见做法及其等同于Kullback-利伯尔差异最小化的激励下,我们提议优化自动编码器输出与经验数据分配之间的分布切片-瓦瑟斯坦距离(DSWD ) 。这种配方的优点是,我们可以根据样本对DSWD进行估计,并处理高维度问题。我们通过随机梯度梯度汉密尔顿·蒙特卡洛对BAE参数进行事后估计,并通过在潜在空间安装灵活的Drichlet混合物模型,将我们的BAE转化为一种基因模型。因此,我们获得了一种强大的替代变异自动编码器的替代方法,这是现代应用自动编码器在不确定的情况下进行模拟学习的首选方法。我们用大量实验活动来评估我们的方法的质量与数量上的一些不可靠的学习任务。我们用大量试验来评估我们的方法,并表明,在小型数据系统中,我们的方法提供了先有先有的状态的多竞争基线。