Variational autoencoders optimize an objective that combines a reconstruction loss (the distortion) and a KL term (the rate). The rate is an upper bound on the mutual information, which is often interpreted as a regularizer that controls the degree of compression. We here examine whether inclusion of the rate also acts as an inductive bias that improves generalization. We perform rate-distortion analyses that control the strength of the rate term, the network capacity, and the difficulty of the generalization problem. Decreasing the strength of the rate paradoxically improves generalization in most settings, and reducing the mutual information typically leads to underfitting. Moreover, we show that generalization continues to improve even after the mutual information saturates, indicating that the gap on the bound (i.e. the KL divergence relative to the inference marginal) affects generalization. This suggests that the standard Gaussian prior is not an inductive bias that typically aids generalization, prompting work to understand what choices of priors improve generalization in VAEs.
翻译:变化式自动编码器优化了将重建损失(扭曲)和 KL 术语(比率)相结合的目标。 利率是相互信息的上限, 通常被解释为控制压缩程度的常规化器。 我们在这里审查是否将利率纳入是否也是一种感化偏差, 从而改进了一般化。 我们进行率扭曲分析, 控制利率期限的强度、 网络容量和一般化问题的难度。 降低利率的强度, 矛盾的是, 改善多数情况下的通用化, 减少相互信息通常会导致不匹配。 此外, 我们显示, 即使在相互信息饱和度之后, 普遍化仍然在改善, 这表明约束上的差别( 即 KL 与推论边缘的差别) 影响了一般化。 这意味着标准古斯古斯以前的偏差并不是一种典型的诱导偏差, 有助于一般化, 促使人们了解前述的选择会改进VAEs的普及。