Most modern latent variable and probabilistic generative models, such as the variational autoencoder (VAE), have certain indeterminacies that are unresolvable even with an infinite amount of data. Recent applications of such models have indicated the need for strongly identifiable models, in which an observation corresponds to a unique latent code. Progress has been made towards reducing model indeterminacies while maintaining flexibility, most notably by the iVAE (arXiv:1907.04809 [stat.ML]), which excludes many -- but not all -- indeterminacies. We construct a full theoretical framework for analyzing the indeterminacies of latent variable models, and characterize them precisely in terms of properties of the generator functions and the latent variable prior distributions. To illustrate, we apply the framework to better understand the structure of recent identifiability results. We then investigate how we might specify strongly identifiable latent variable models, and construct two such classes of models. One is a straightforward modification of iVAE; the other uses ideas from optimal transport and leads to novel models and connections to recent work.
翻译:大多数现代潜伏变量和概率型基因变异模型,如变异自动电解码器(VAE),具有一定的不确定性,即使用无限数量的数据也无法解析。这些模型的近期应用表明,需要非常可识别的模型,其中的观测与独特的潜伏代码相对应。在减少模型的不确定性方面已经取得了进展,同时保持灵活性,特别是iVAE(arXiv:1907.04809[stat.ML]),它排除了许多 -- -- 但不是全部 -- -- 确定性。我们为分析潜伏变量模型的不确定性建立了一个完整的理论框架,并精确地从生成方函数的特性和潜在变异先前分布的角度描述这些模型。为了说明,我们应用这个框架来更好地了解最近的可识别性结果的结构。然后我们研究我们如何确定可辨别出非常可识别的潜在变量模型,并建造两种类型的模型。其中一种是对iVAE的简单修改;另一种是最佳运输方式的其他想法,并导致新模式和与近期工作的连接。