The dictionary learning problem, representing data as a combination of a few atoms, has long stood as a popular method for learning representations in statistics and signal processing. The most popular dictionary learning algorithm alternates between sparse coding and dictionary update steps, and a rich literature has studied its theoretical convergence. The success of dictionary learning relies on access to a "good" initial estimate of the dictionary and the ability of the sparse coding step to provide an unbiased estimate of the code. The growing popularity of unrolled sparse coding networks has led to the empirical finding that backpropagation through such networks performs dictionary learning. We offer the theoretical analysis of these empirical results through PUDLE, a Provable Unrolled Dictionary LEarning method. We provide conditions on the network initialization and data distribution sufficient to recover and preserve the support of the latent code. Additionally, we address two challenges; first, the vanilla unrolled sparse coding computes a biased code estimate, and second, gradients during backpropagated learning can become unstable. We show approaches to reduce the bias of the code estimate in the forward pass, and that of the dictionary estimate in the backward pass. We propose strategies to resolve the learning instability by tuning network parameters and modifying the loss function. Overall, we highlight the impact of loss, unrolling, and backpropagation on convergence. We complement our findings through synthetic and image denoising experiments. Finally, we demonstrate PUDLE's interpretability, a driving factor in designing deep networks based on iterative optimizations, by building a mathematical relation between network weights, its output, and the training set.
翻译:字典学习问题代表了数据作为少数原子的组合,长期以来,它一直作为一种在统计和信号处理中学习表现的流行方法。最受欢迎的字典学习算法替代了稀疏的编码和字典更新步骤,一个丰富的文献研究了它的理论趋同。字典学习的成功取决于对字典的“良好”初步估计的存取,以及稀疏的编码步骤对代码进行不偏倚估计的能力。无滚动的编码网络越来越受欢迎,导致通过这些网络进行深层反向分析的经验发现词典学习。我们通过PUDLE提供对这些经验结果的理论分析。我们通过PUDLE,一种可变动的不动词典读写法读取方法。我们为网络的初始化和数据分发提供了条件,足以恢复和保持对潜在代码的支持。此外,我们应对两个挑战:第一,香草不动的编码计算对代码估计进行偏差,第二,在反向的学习过程中,梯度的学习会变得不稳定。我们展示了代码估算的偏差,我们用的方法来减少前路路的偏差, 并设计了逻辑浏览的网络的精度,我们从最后的变变变变的变的计算,我们学习了 的计算结果,我们学习了整个的计算结果的计算,我们学习了整个的计算,我们学习的变变变变变的变变的变的变变的变的变的变的变的变的变的计算。我们的变的变的,我们的变的变的变的计算,我们的变的变的变的变的计算。