In over two decades of research, the field of dictionary learning has gathered a large collection of successful applications, and theoretical guarantees for model recovery are known only whenever optimization is carried out in the same model class as that of the underlying dictionary. This work characterizes the surprising phenomenon that dictionary recovery can be facilitated by searching over the space of larger over-realized models. This observation is general and independent of the specific dictionary learning algorithm used. We thoroughly demonstrate this observation in practice and provide an analysis of this phenomenon by tying recovery measures to generalization bounds. In particular, we show that model recovery can be upper-bounded by the empirical risk, a model-dependent quantity and the generalization gap, reflecting our empirical findings. We further show that an efficient and provably correct distillation approach can be employed to recover the correct atoms from the over-realized model. As a result, our meta-algorithm provides dictionary estimates with consistently better recovery of the ground-truth model.
翻译:在20多年的研究中,字典学习领域收集了大量成功的应用,只有在与基本字典相同的模型类别中进行优化时,才知道模型恢复的理论保障。这项工作说明一种令人惊讶的现象,即通过搜索大范围超现实模型的空间,可以促进字典恢复。这一观察是一般性的,独立于所使用的具体的词典学习算法。我们在实践中充分展示了这一观察,并通过将恢复措施与概括界限挂钩来分析这一现象。特别是,我们表明,模型恢复可以被经验风险、依赖模型的数量和一般化差距所覆盖,这反映了我们的经验调查结果。我们进一步表明,可以采用有效和可可被看得到的正确蒸馏方法,从超现实模型中恢复正确的原子。结果,我们的元法理学提供了字典估计,使地面光谱模型不断得到更好的恢复。