字典学习在近乎线性稀疏度下的应用 (Dictionary Learning for the Almost-Linear Sparsity Regime)

Dictionary learning, the problem of recovering a sparsely used matrix $\mathbf{D} \in \mathbb{R}^{M \times K}$ and $N$ $s$-sparse vectors $\mathbf{x}_i \in \mathbb{R}^{K}$ from samples of the form $\mathbf{y}_i = \mathbf{D}\mathbf{x}_i$, is of increasing importance to applications in signal processing and data science. When the dictionary is known, recovery of $\mathbf{x}_i$ is possible even for sparsity linear in dimension $M$, yet to date, the only algorithms which provably succeed in the linear sparsity regime are Riemannian trust-region methods, which are limited to orthogonal dictionaries, and methods based on the sum-of-squares hierarchy, which requires super-polynomial time in order to obtain an error which decays in $M$. In this work, we introduce SPORADIC (SPectral ORAcle DICtionary Learning), an efficient spectral method on family of reweighted covariance matrices. We prove that in high enough dimensions, SPORADIC can recover overcomplete ($K > M$) dictionaries satisfying the well-known restricted isometry property (RIP) even when sparsity is linear in dimension up to logarithmic factors. Moreover, these accuracy guarantees have an ``oracle property" that the support and signs of the unknown sparse vectors $\mathbf{x}_i$ can be recovered exactly with high probability, allowing for arbitrarily close estimation of $\mathbf{D}$ with enough samples in polynomial time. To the author's knowledge, SPORADIC is the first polynomial-time algorithm which provably enjoys such convergence guarantees for overcomplete RIP matrices in the near-linear sparsity regime.

翻译：字典学习是从形如 $\mathbf{y}_i = \mathbf{D} \mathbf{x}_i$ 的 $N$ 个 $s$-稀疏向量 $\mathbf{x}_i \in \mathbb{R}^K$ 的样本中，恢复出被稀疏使用的矩阵 $\mathbf{D} \in \mathbb{R}^{M \times K}$ 的问题，目前已经被广泛应用于信号处理和数据科学领域。当字典已知时，即使稀疏度是线性的（与维度 $M$ 成正比），也可以恢复 $\mathbf{x}_i$，但迄今为止，唯一可以在线性稀疏度下得出可证明收敛结果的算法是 Riemannian trust-region 方法，这种方法仅适用于正交字典，并且需要基于 sum-of-squares 层次结构的方法来获得在 $M$ 上衰减的误差，这需要超多项式时间。在这项工作中，我们引入了 SPORADIC（基于重权后的协方差矩阵的谱方法），这是一种高效的谱方法，可应用于一类协方差矩阵。我们证明，当维度足够高时，SPORADIC 可以恢复过完备的 RIP（Restricted Isometry Property）矩阵（$K > M$），即使稀疏度在线性加对数因子的情况下也能成功。此外，这些准确性保证具有“Oracle 性质”，即未知的稀疏向量 $\mathbf{x}_i$ 的支撑集和符号可以得到精确地恢复，可以在多项式时间内准确估计 $\mathbf{D}$。据作者所知，SPORADIC 是第一个具有这种收敛保证的多项式时间算法，可以在近乎线性稀疏度下，对于过完备的 RIP 矩阵得到收敛保证。