The $\ell_p$ subspace approximation problem is an NP-hard low rank approximation problem that generalizes the median hyperplane ($p = 1$), principal component analysis ($p = 2$), and center hyperplane problems ($p = \infty$). A popular approach to cope with the NP-hardness is to compute a strong coreset, which is a weighted subset of input points that simultaneously approximates the cost of every $k$-dimensional subspace, typically to $(1+\epsilon)$ relative error for a small constant $\epsilon$. We obtain an algorithm for constructing a strong coreset for $\ell_p$ subspace approximation of size $\tilde O(k\epsilon^{-4/p})$ for $p<2$ and $\tilde O(k^{p/2}\epsilon^{-p})$ for $p>2$. This offers the following improvements over prior work: - We construct the first strong coresets with nearly optimal dependence on $k$ for all $p\neq 2$. In prior work, [SW18] constructed coresets of modified points with a similar dependence on $k$, while [HV20] constructed true coresets with polynomially worse dependence on $k$. - We recover or improve the best known $\epsilon$ dependence for all $p$. In particular, for $p > 2$, the [SW18] coreset of modified points had a dependence of $\epsilon^{-p^2/2}$ and the [HV20] coreset had a dependence of $\epsilon^{-3p}$. Our algorithm is based on sampling by root ridge leverage scores, which admits fast algorithms, especially for sparse or structured matrices. Our analysis avoids the use of the representative subspace theorem [SW18], which is a critical component of all prior dimension-independent coresets for $\ell_p$ subspace approximation. Our techniques also lead to the first nearly optimal online strong coresets for $\ell_p$ subspace approximation with similar bounds as the offline setting, resolving a problem of [WY23]. All prior approaches lose $\mathrm{poly}(k)$ factors in this setting, even when allowed to modify the original points.
翻译:暂无翻译