Our understanding of learning input-output relationships with neural nets has improved rapidly in recent years, but little is known about the convergence of the underlying representations, even in the simple case of linear autoencoders (LAEs). We show that when trained with proper regularization, LAEs can directly learn the optimal representation -- ordered, axis-aligned principal components. We analyze two such regularization schemes: non-uniform $\ell_2$ regularization and a deterministic variant of nested dropout [Rippel et al, ICML' 2014]. Though both regularization schemes converge to the optimal representation, we show that this convergence is slow due to ill-conditioning that worsens with increasing latent dimension. We show that the inefficiency of learning the optimal representation is not inevitable -- we present a simple modification to the gradient descent update that greatly speeds up convergence empirically.
翻译:近年来,我们对与神经网的学习投入-产出关系的理解迅速改善,但对基本表述的趋同却知之甚少,即使在线性自动电解器(LAEs)的简单例子中也是如此。我们表明,在经过适当正规化培训后,LAE可以直接学习最佳表述方法 -- -- 有序的、轴轴基的主要组成部分。我们分析了两种此类正规化方案:非统一化的$@ell_2美元正规化和固定式辍学的决定性变式[Rippel等人,ICML',2014]。尽管两种正规化方案都趋于最佳代表性,但我们表明这种趋同速度缓慢,因为随着潜在层面的不断增加而恶化的不良调节方法。我们表明,学习最佳代表性效率低下并非不可避免的 -- -- 我们对梯度下降的更新进行了简单的修改,大大加快了经验上的趋同。