Variational autoencoders (VAEs) are a popular framework for modeling complex data distributions; they can be efficiently trained via variational inference by maximizing the evidence lower bound (ELBO), at the expense of a gap to the exact (log-)marginal likelihood. While VAEs are commonly used for representation learning, it is unclear why ELBO maximization would yield useful representations, since unregularized maximum likelihood estimation cannot invert the data-generating process. Yet, VAEs often succeed at this task. We seek to elucidate this apparent paradox by studying nonlinear VAEs in the limit of near-deterministic decoders. We first prove that, in this regime, the optimal encoder approximately inverts the decoder -- a commonly used but unproven conjecture -- which we refer to as {\em self-consistency}. Leveraging self-consistency, we show that the ELBO converges to a regularized log-likelihood. This allows VAEs to perform what has recently been termed independent mechanism analysis (IMA): it adds an inductive bias towards decoders with column-orthogonal Jacobians, which helps recovering the true latent factors. The gap between ELBO and log-likelihood is therefore welcome, since it bears unanticipated benefits for nonlinear representation learning. In experiments on synthetic and image data, we show that VAEs uncover the true latent factors when the data generating process satisfies the IMA assumption.
翻译:动态自动编码器(VAEs)是建模复杂数据分布的流行框架;它们可以通过变式推断得到有效的培训,其方法是尽可能扩大证据的较低约束范围(ELBO),而牺牲精确(log-)边缘可能性的缺口。虽然VAEs通常用于代表性学习,但不清楚为什么ELBO最大化会产生有用的表述,因为不正规的最大可能性估计无法逆转数据生成过程。然而, VAEEs常常成功完成这项任务。我们试图通过在接近确定性解析解析器的限度内研究非线性VAEs(ELBO)来澄清这一明显的悖论。我们首先证明,在这个制度内,最优的编码器大约能将解码器反向准确(log-log- conven contecture ) -- 我们称之为 自我一致性的自我调节, 我们显示 ELEBOs常常与常规的日志相似。 这使得VAEEs在最近被称作独立机制的假设性(IMA) 的模型分析中, 能够将真实的模型与直观的图像分析结果显示, 正在恢复的轨道。