Sparse coding is a class of unsupervised methods for learning a sparse representation of the input data in the form of a linear combination of a dictionary and a sparse code. This learning framework has led to state-of-the-art results in various image and video processing tasks. However, classical methods learn the dictionary and the sparse code based on alternating optimizations, usually without theoretical guarantees for either optimality or convergence due to non-convexity of the problem. Recent works on sparse coding with a complete dictionary provide strong theoretical guarantees thanks to the development of the non-convex optimization. However, initial non-convex approaches learn the dictionary in the sparse coding problem sequentially in an atom-by-atom manner, which leads to a long execution time. More recent works seek to directly learn the entire dictionary at once, which substantially reduces the execution time. However, the associated recovery performance is degraded with a finite number of data samples. In this paper, we propose an efficient sparse coding scheme with a two-stage optimization. The proposed scheme leverages the global and local Riemannian geometry of the two-stage optimization problem and facilitates fast implementation for superb dictionary recovery performance by a finite number of samples without atom-by-atom calculation. We further prove that, with high probability, the proposed scheme can exactly recover any atom in the target dictionary with a finite number of samples if it is adopted to recover one atom of the dictionary. An application on wireless sensor data compression is also proposed. Experiments on both synthetic and real-world data verify the efficiency and effectiveness of the proposed scheme.
翻译:粗化的编码是一种未经监督的方法,用于学习以词典和稀有代码的线性组合形式对输入数据进行稀少的表述。 但是,这种学习框架导致在各种图像和视频处理任务中产生最先进的结果。 但是,古典方法在交替优化的基础上学习词典和稀少代码,通常由于问题不调和,因此没有最佳或趋同的理论保证。 最近用完整的字典进行的关于稀少编码的工作由于发展非Convex优化提供了强有力的理论保证。然而,最初的非Convex方法在分散的编码问题中以逐个原子方式相继学习,从而导致在各种图像和视频处理任务中产生最先进的结果。 较近期的工作试图直接学习整个词典,从而大大缩短了执行时间。 然而,由于数据样本数量有限,因此相关的恢复业绩已经退化。 在本文中,我们建议采用一个高效的稀少编码方案,通过两个阶段优化来利用全球和地方的Riemann对二级词典的精确度测算方法,从而不按一个阶段的精确度来进行精确的精确的计算,从而推算,在任何精确的精确的精确的精确地计算方法,从而推推算,在任何一个阶段的精确的精确的恢复方法可以推算,从而推算出任何精确的精确的精确地在任何精确的精确的精确的精确的恢复。