When training a variational autoencoder (VAE) on a given dataset, determining the optimal number of latent variables is mostly done by grid search: a costly process in terms of computational time and carbon footprint. In this paper, we explore the intrinsic dimension estimation (IDE) of the data and latent representations learned by VAEs. We show that the discrepancies between the IDE of the mean and sampled representations of a VAE after only a few steps of training reveal the presence of passive variables in the latent space, which, in well-behaved VAEs, indicates a superfluous number of dimensions. Using this property, we propose FONDUE: an algorithm which quickly finds the number of latent dimensions after which the mean and sampled representations start to diverge (i.e., when passive variables are introduced), providing a principled method for selecting the number of latent dimensions for VAEs and autoencoders.
翻译:当在特定数据集上培训一个变式自动编码器(VAE)时,确定潜在变量的最佳数量主要是通过网格搜索完成的:计算时间和碳足迹方面的费用昂贵的过程。在本文中,我们探讨了VAE所学数据和潜在表层的内在维度估计(IDE)。我们表明,在仅经过几步培训之后,VAE中平均值和抽样表层的IDE之间的差异暴露了潜层中存在被动变量,这些变量在良好的VAE中显示出多余的维度。我们提出FONUE:一种算法,它很快发现潜在维度的数量,在此之后,平均和抽样表层开始出现差异(例如,当引入被动变量时),为选择VAEs和自动编码器的潜在维度提供了一种原则性方法。