We improve upon previous oblivious sketching and turnstile streaming results for $\ell_1$ and logistic regression, giving a much smaller sketching dimension achieving $O(1)$-approximation and yielding an efficient optimization problem in the sketch space. Namely, we achieve for any constant $c>0$ a sketching dimension of $\tilde{O}(d^{1+c})$ for $\ell_1$ regression and $\tilde{O}(\mu d^{1+c})$ for logistic regression, where $\mu$ is a standard measure that captures the complexity of compressing the data. For $\ell_1$-regression our sketching dimension is near-linear and improves previous work which either required $\Omega(\log d)$-approximation with this sketching dimension, or required a larger $\operatorname{poly}(d)$ number of rows. Similarly, for logistic regression previous work had worse $\operatorname{poly}(\mu d)$ factors in its sketching dimension. We also give a tradeoff that yields a $1+\varepsilon$ approximation in input sparsity time by increasing the total size to $(d\log(n)/\varepsilon)^{O(1/\varepsilon)}$ for $\ell_1$ and to $(\mu d\log(n)/\varepsilon)^{O(1/\varepsilon)}$ for logistic regression. Finally, we show that our sketch can be extended to approximate a regularized version of logistic regression where the data-dependent regularizer corresponds to the variance of the individual logistic losses.
翻译:本文改进了之前的快速草图法和插入式流算法,针对$\ell_1$和逻辑回归提出了更小的草图维数,达到$O(1)$近似水平,并在草图空间中获得了高效优化问题。准确地说,我们对于任何常数$c>0$,实现了$\ell_1$回归的草图维数为$\tilde{O}(d^{1+c})$,逻辑回归的草图维数为$\tilde{O}(\mu d^{1+c})$,其中$\mu$是捕捉数据压缩复杂性的标准量。对于$\ell_1$回归,我们的草图维数接近于线性,改进了之前的工作,以此草图维度要求$O(\log d)$的近似,或者要求更多的$\operatorname{poly}(d)$行数。同样的,对于逻辑回归,之前的工作在草图维度上存在更差的$\operatorname{poly}(\mu d)$因子。我们还提供了一个权衡,通过增加总大小到$(d\log(n)/\varepsilon)^{O(1/\varepsilon)}$来完成$1+\varepsilon$的输入稀疏时间近似,对于$\ell_1$需要$(\mu d\log(n)/\varepsilon)^{O(1/\varepsilon)}$。最后,我们展示了我们的草图可以扩展到逼近一个正则化的逻辑回归版本,其中数据相关的正则化器对应于各个逻辑损失的方差。