Most modern probabilistic generative models, such as the variational autoencoder (VAE), have certain indeterminacies that are unresolvable even with an infinite amount of data. Different tasks tolerate different indeterminacies, however recent applications have indicated the need for strongly identifiable models, in which an observation corresponds to a unique latent code. Progress has been made towards reducing model indeterminacies while maintaining flexibility, and recent work excludes many--but not all--indeterminacies. In this work, we motivate model-identifiability in terms of task-identifiability, then construct a theoretical framework for analyzing the indeterminacies of latent variable models, which enables their precise characterization in terms of the generator function and prior distribution spaces. We reveal that strong identifiability is possible even with highly flexible nonlinear generators, and give two such examples. One is a straightforward modification of iVAE (arXiv:1907.04809 [stat.ML]); the other uses triangular monotonic maps, leading to novel connections between optimal transport and identifiability.
翻译:大多数现代概率基因模型,如变异自动编码器(VAE),具有一定的确定性,即使有无限数量的数据也无法解决。不同的任务容忍不同的不确定性,但最近的应用表明需要非常可辨别的模式,其中的观察与独特的潜在代码相对应。在减少模型的确定性方面取得了进展,同时保持灵活性,最近的工作排除了许多但并非全部的不确定性。在这项工作中,我们鼓励在任务可识别性方面确定模型的可辨性,然后建立一个理论框架,分析潜在变量模型的不确定性,以便能够精确地描述其发电机功能和先前分布空间的特性。我们发现,即使使用高度灵活的非线性发电机,也有可能具有很强的可辨别性,并举两个例子。一个是直接修改iVAE(arXiv:1907.4809[stat.ML]);另一个是直接修改iVAE(arXiv:1907.4809[stat.ML];另一个是使用三角单调图,导致最佳运输与可辨性之间的新型联系。</s>