Contrastively trained encoders have recently been proven to invert the data-generating process: they encode each input, e.g., an image, into the true latent vector that generated the image (Zimmermann et al., 2021). However, real-world observations often have inherent ambiguities. For instance, images may be blurred or only show a 2D view of a 3D object, so multiple latents could have generated them. This makes the true posterior for the latent vector probabilistic with heteroscedastic uncertainty. In this setup, we extend the common InfoNCE objective and encoders to predict latent distributions instead of points. We prove that these distributions recover the correct posteriors of the data-generating process, including its level of aleatoric uncertainty, up to a rotation of the latent space. In addition to providing calibrated uncertainty estimates, these posteriors allow the computation of credible intervals in image retrieval. They comprise images with the same latent as a given query, subject to its uncertainty.
翻译:最近经过比较培训的编码器被证明可以推翻生成数据的过程:它们将每个输入(例如图像)编码成生成图像的真正潜伏矢量(Zimmermann等人,2021年)。然而,现实世界的观测往往具有内在的模糊性。例如,图像可能模糊,或者只能显示3D对象的2D视图,因此多个潜伏可能生成它们。这样,潜伏矢量的概率就会发生超强的不确定性。在这个设置中,我们扩展了共同的信息、NCE目标和编码器,以预测潜在的分布而不是点。我们证明,这些分布物回收了数据生成过程的正确后方,包括其偏移的不确定性程度,直至潜伏空间的旋转。除了提供校准的不确定性估计之外,这些后方还允许在图像检索中计算可靠的间隔。它们包括与给定的查询具有相同潜伏性的图像,视其不确定性而定。