Most modern latent variable and probabilistic generative models, such as the variational autoencoder (VAE), have certain indeterminacies that are unresolvable even with an infinite amount of data. Recent applications of such models have indicated the need for \textit{strongly} identifiable models, in which an observation corresponds to a unique latent code. Progress has been made towards reducing model indeterminacies while maintaining flexibility, most notably by the iVAE (arXiv:1907.04809 [stat.ML]), which excludes many -- but not all -- indeterminacies. We construct a full theoretical framework for analyzing the indeterminacies of latent variable models, and characterize them precisely in terms of properties of the generator functions and the latent variable prior distributions. To illustrate, we apply the framework to better understand the structure of recent identifiability results. We then investigate how we might specify strongly identifiable latent variable models, and construct two such classes of models. One is a straightforward modification of iVAE; the other uses ideas from optimal transport and leads to novel models and connections to recent work.
翻译:大多数现代潜伏变量和概率型变异模型,如变异自动编码器(VAE)等,具有一定的不确定性,即使用无限数量的数据也无法解析。这些模型最近的应用表明需要可识别的模型,这种模型的观察与独特的潜伏代码相对应。在减少模型不确定性方面已经取得了进展,同时保持灵活性,特别是iVAE(arXiv:1907.04809[stat.ML]),它排除了许多 -- -- 但不是全部 -- -- 不确定性。我们建立了一个完整的理论框架,用于分析潜在变异模型的不确定性,并精确地描述这些模型的特性和潜在变异先前分布。我们用这个框架来更好地了解最近可识别性结果的结构。我们然后研究我们如何可以指定非常可识别的潜在变异性模型,并构建两种类型的模型。其中一种是对iVAE的简单修改;其他想法来自最佳运输,并导致新模型和与近期工作的连接。