Most modern probabilistic generative models, such as the variational autoencoder (VAE), have certain indeterminacies that are unresolvable even with an infinite amount of data. Different tasks tolerate different indeterminacies, however recent applications have indicated the need for strongly identifiable models, in which an observation corresponds to a unique latent code. Progress has been made towards reducing model indeterminacies while maintaining flexibility, and recent work excludes many--but not all--indeterminacies. In this work, we motivate model-identifiability in terms of task-identifiability, then construct a theoretical framework for analyzing the indeterminacies of latent variable models, which enables their precise characterization in terms of the generator function and prior distribution spaces. We reveal that strong identifiability is possible even with highly flexible nonlinear generators, and give two such examples. One is a straightforward modification of iVAE (arXiv:1907.04809 [stat.ML]); the other uses triangular monotonic maps, leading to novel connections between optimal transport and identifiability.
翻译:大多数现代概率型基因模型,如变异自动编码器(VAE),具有一定的确定性,即使有无限数量的数据也无法解决。不同的任务容忍不同的不确定性,但最近的应用表明需要非常可识别的模型,其中的观测与独特的潜在代码相对应。在减少模型的确定性方面取得了进展,同时保持灵活性,最近的工作排除了许多但并非全部的不确定性。在这项工作中,我们鼓励在任务可识别性方面确定模型的可识别性,然后建立一个理论框架,分析潜在的可变模型的不确定性,使这些模型能够精确地描述发电机功能和先前的分布空间。我们发现,即使使用高度灵活的非线性发电机,也有可能具有很强的可识别性,并举两个例子。其中一个是直接修改iVAE(arXiv:1907.4809[stat.ML]);另一个是使用三角单调图,导致最佳运输与可识别性之间的新型联系。