The manifold assumption for high-dimensional data assumes that the data is generated by varying a set of parameters obtained from a low-dimensional latent space. Deep generative models (DGMs) are widely used to learn data representations in an unsupervised way. DGMs parameterize the underlying low-dimensional manifold in the data space using bottleneck architectures such as variational autoencoders (VAEs). The bottleneck dimension for VAEs is treated as a hyperparameter that depends on the dataset and is fixed at design time after extensive tuning. As the intrinsic dimensionality of most real-world datasets is unknown, often, there is a mismatch between the intrinsic dimensionality and the latent dimensionality chosen as a hyperparameter. This mismatch can negatively contribute to the model performance for representation learning and sample generation tasks. This paper proposes relevance encoding networks (RENs): a novel probabilistic VAE-based framework that uses the automatic relevance determination (ARD) prior in the latent space to learn the data-specific bottleneck dimensionality. The relevance of each latent dimension is directly learned from the data along with the other model parameters using stochastic gradient descent and a reparameterization trick adapted to non-Gaussian priors. We leverage the concept of DeepSets to capture permutation invariant statistical properties in both data and latent spaces for relevance determination. The proposed framework is general and flexible and can be used for the state-of-the-art VAE models that leverage regularizers to impose specific characteristics in the latent space (e.g., disentanglement). With extensive experimentation on synthetic and public image datasets, we show that the proposed model learns the relevant latent bottleneck dimensionality without compromising the representation and generation quality of the samples.
 翻译:高维数据的多重假设假设,数据来自从低维潜层空间获得的一组不同参数。深基因模型(DGM)被广泛用于以不受监督的方式学习数据表达方式。DGMs使用变式自动读数仪(VAEs)等瓶颈结构对数据空间的底基低维元进行参数参数化。VAEs的瓶颈层面被视为一个超参数,它取决于数据集,在广泛调整后在设计时固定。由于大多数真实世界数据集的内在维度未知,往往被广泛用于以不受监督的方式学习数据表达。DGMs参数的内在维度和被选为超常立度的潜维维维维度之间存在不匹配。DGMs参数对模拟学习和样本生成任务的模型性能做出负面贡献。本文提出了相关的编码网络:一个新的概率VAE基框架,在潜伏空间之前使用自动相关性确定(ARD)来学习数据特定的瓶度尺寸。每个潜值的维维维维维维值模型的关联性,在不精确度模型中直接学习了数据,在高级数据模型中,在高级数据模型中,在高级数据模型中,在高级数据模型中,在模型中,在模型中直接学习了比值模型中,在精确度中,在精确度中,在模型中可以学习。