Sparse coding refers to modeling a signal as sparse linear combinations of the elements of a learned dictionary. Sparse coding has proven to be a successful and interpretable approach in many applications, such as signal processing, computer vision, and medical imaging. While this success has spurred much work on sparse coding with provable guarantees, work on the setting where the learned dictionary is larger (or \textit{over-realized}) with respect to the ground truth is comparatively nascent. Existing theoretical results in the over-realized regime are limited to the case of noise-less data. In this paper, we show that for over-realized sparse coding in the presence of noise, minimizing the standard dictionary learning objective can fail to recover the ground-truth dictionary, regardless of the magnitude of the signal in the data-generating process. Furthermore, drawing from the growing body of work on self-supervised learning, we propose a novel masking objective and we prove that minimizing this new objective can recover the ground-truth dictionary. We corroborate our theoretical results with experiments across several parameter regimes, showing that our proposed objective enjoys better empirical performance than the standard reconstruction objective.
翻译:松散的编码是指将信号建模为所学字典元素的稀疏线性组合。 松散的编码已证明在许多应用中是一种成功和可解释的方法, 如信号处理、 计算机视觉和医学成像等。 虽然这一成功刺激了大量以可验证的保证进行稀少的编码工作, 有关在地面真理方面所学字典较大( 或\ textit{ 超现实 ) 的设置的工作相对刚刚开始。 过度实现制度中的现有理论结果仅限于无噪音数据。 在本文中, 我们显示,对于在噪音存在的情况下过度实现的稀释编码, 尽量减少标准的字典学习目标可能无法恢复地面真理字典, 不论数据生成过程中的信号有多大 。 此外, 我们从不断增长的自我监督学习工作中, 提出了一个新的掩码目标, 我们证明, 尽量减少这一新目标可以恢复无噪音的字典。 我们用几个参数系统的实验来证实我们的理论结果, 表明我们拟议的目标比标准重建目标具有更好的实证性。</s>