新的低秩逼近子集选择算法：离线和在线 (New Subset Selection Algorithms for Low Rank Approximation: Offline and Online)

Subset selection for the rank $k$ approximation of an $n\times d$ matrix $A$ offers improvements in the interpretability of matrices, as well as a variety of computational savings. This problem is well-understood when the error measure is the Frobenius norm, with various tight algorithms known even in challenging models such as the online model, where an algorithm must select the column subset irrevocably when the columns arrive one by one. In contrast, for other matrix losses, optimal trade-offs between the subset size and approximation quality have not been settled, even in the offline setting. We give a number of results towards closing these gaps. In the offline setting, we achieve nearly optimal bicriteria algorithms in two settings. First, we remove a $\sqrt k$ factor from a result of [SWZ19] when the loss function is any entrywise loss with an approximate triangle inequality and at least linear growth. Our result is tight for the $\ell_1$ loss. We give a similar improvement for entrywise $\ell_p$ losses for $p>2$, improving a previous distortion of $k^{1-1/p}$ to $k^{1/2-1/p}$. Our results come from a technique which replaces the use of a well-conditioned basis with a slightly larger spanning set for which any vector can be expressed as a linear combination with small Euclidean norm. We show that this technique also gives the first oblivious $\ell_p$ subspace embeddings for $1<p<2$ with $\tilde O(d^{1/p})$ distortion, which is nearly optimal and closes a long line of work. In the online setting, we give the first online subset selection algorithm for $\ell_p$ subspace approximation and entrywise $\ell_p$ low rank approximation by implementing sensitivity sampling online, which is challenging due to the sequential nature of sensitivity sampling. Our main technique is an online algorithm for detecting when an approximately optimal subspace changes substantially.

翻译：对于 $n \times d$ 矩阵 $A$ 的秩为 $k$ 的逼近问题，子集选择可以提高矩阵的可解释性，以及实现计算上的节约。当误差度量为 Frobenius 范数时，这个问题已经被很好地理解，甚至在挑战性的在线模型中已经有了几种紧凑算法，其中算法必须在每个列到达时不可逆地选择列子集。相反，对于其他矩阵损失，即使在离线设置下，子集大小和逼近质量之间的最优权衡也未能解决。我们给出了一些结果以缩小这些差距。在离线设置中，我们在两个设置中实现了近乎最优的双标准算法。首先，当损失函数是具有近似三角不等式和至少线性增长的任何条目损失时，我们消除了 [SWZ19] 的 $\sqrt{k}$ 因子。我们的结果对于 $\ell_1$ 损失是紧致的。我们对于 $\ell_p$ 条目损失 ($p > 2$)，从 $k^{1-1/p}$ 改进到 $k^{1/2-1/p}$。我们的结果来自一种技术，该技术使用一个稍大的跨度集合替代了一个良好条件的基础集来表示任何向量可以用小的欧几里得范数线性组合。我们证明，这一技术还提供了 $1 < p < 2$ 的第一个显式的 $\tilde O(d^{1/p})$ 畸变无意的 $\ell_p$ 子空间嵌入，这是几乎最优的，并解决了一个漫长的工作线。在在线设置中，我们通过在线实现灵敏度抽样，给出了第一个 $\ell_p$ 子空间逼近和条目 $\ell_p$ 低秩逼近的在线子集选择算法，这是有挑战的，因为灵敏度抽样是一种序列性质。我们的主要技术是在线算法，用于检测近似最优的子空间何时发生了显著变化。