Dictionary learning, the problem of recovering a sparsely used matrix $\mathbf{D} \in \mathbb{R}^{M \times K}$ and $N$ $s$-sparse vectors $\mathbf{x}_i \in \mathbb{R}^{K}$ from samples of the form $\mathbf{y}_i = \mathbf{D}\mathbf{x}_i$, is of increasing importance to applications in signal processing and data science. When the dictionary is known, recovery of $\mathbf{x}_i$ is possible even for sparsity linear in dimension $M$, yet to date, the only algorithms which provably succeed in the linear sparsity regime are Riemannian trust-region methods, which are limited to orthogonal dictionaries, and methods based on the sum-of-squares hierarchy, which requires super-polynomial time in order to obtain an error which decays in $M$. In this work, we introduce SPORADIC (SPectral ORAcle DICtionary Learning), an efficient spectral method on family of reweighted covariance matrices. We prove that in high enough dimensions, SPORADIC can recover overcomplete ($K > M$) dictionaries satisfying the well-known restricted isometry property (RIP) even when sparsity is linear in dimension up to logarithmic factors. Moreover, these accuracy guarantees have an ``oracle property" that the support and signs of the unknown sparse vectors $\mathbf{x}_i$ can be recovered exactly with high probability, allowing for arbitrarily close estimation of $\mathbf{D}$ with enough samples in polynomial time. To the author's knowledge, SPORADIC is the first polynomial-time algorithm which provably enjoys such convergence guarantees for overcomplete RIP matrices in the near-linear sparsity regime.
翻译:字典学习是从形如 $\mathbf{y}_i = \mathbf{D} \mathbf{x}_i$ 的 $N$ 个 $s$-稀疏向量 $\mathbf{x}_i \in \mathbb{R}^K$ 的样本中,恢复出被稀疏使用的矩阵 $\mathbf{D} \in \mathbb{R}^{M \times K}$ 的问题,目前已经被广泛应用于信号处理和数据科学领域。当字典已知时,即使稀疏度是线性的(与维度 $M$ 成正比),也可以恢复 $\mathbf{x}_i$,但迄今为止,唯一可以在线性稀疏度下得出可证明收敛结果的算法是 Riemannian trust-region 方法,这种方法仅适用于正交字典,并且需要基于 sum-of-squares 层次结构的方法来获得在 $M$ 上衰减的误差,这需要超多项式时间。在这项工作中,我们引入了 SPORADIC(基于重权后的协方差矩阵的谱方法),这是一种高效的谱方法,可应用于一类协方差矩阵。我们证明,当维度足够高时,SPORADIC 可以恢复过完备的 RIP(Restricted Isometry Property)矩阵($K > M$),即使稀疏度在线性加对数因子的情况下也能成功。此外,这些准确性保证具有“Oracle 性质”,即未知的稀疏向量 $\mathbf{x}_i$ 的支撑集和符号可以得到精确地恢复,可以在多项式时间内准确估计 $\mathbf{D}$。据作者所知,SPORADIC 是第一个具有这种收敛保证的多项式时间算法,可以在近乎线性稀疏度下,对于过完备的 RIP 矩阵得到收敛保证。