Variational autoencoders (VAEs) optimize an objective that comprises a reconstruction loss (the distortion) and a KL term (the rate). The rate is an upper bound on the mutual information, which is often interpreted as a regularizer that controls the degree of compression. We here examine whether inclusion of the rate term also improves generalization. We perform rate-distortion analyses in which we control the strength of the rate term, the network capacity, and the difficulty of the generalization problem. Lowering the strength of the rate term paradoxically improves generalization in most settings, and reducing the mutual information typically leads to underfitting. Moreover, we show that generalization performance continues to improve even after the mutual information saturates, indicating that the gap on the bound (i.e. the KL divergence relative to the inference marginal) affects generalization. This suggests that the standard spherical Gaussian prior is not an inductive bias that typically improves generalization, prompting further work to understand what choices of priors improve generalization in VAEs.
翻译:变化式自动编码器( VAEs) 优化一个包含重建损失( 扭曲) 和 KL 术语( 速率) 的目标。 利率是相互信息的上限, 通常被解释为控制压缩程度的常规化器 。 我们在这里检查是否包含率术语也提高了一般化程度 。 我们进行率扭曲分析, 以控制利率术语的强度、 网络容量 和普遍化问题的难度 。 降低利率术语的强度, 矛盾的是, 在多数情况下, 改进了通用化, 减少相互信息通常导致不匹配 。 此外, 我们显示, 即使在相互信息饱和度之后, 普遍化的性能仍然在改善, 这表明约束上的差别( 即 KL 相对于推论边缘的差别) 影响一般化 。 这意味着, 标准球面盖不是典型的暗示性偏差性偏差, 改善一般化, 促使人们进一步了解先前的选择如何改进VAE 。