What guarantees are possible for solving logistic regression in one pass over a data stream? To answer this question, we present the first data oblivious sketch for logistic regression. Our sketch can be computed in input sparsity time over a turnstile data stream and reduces the size of a $d$-dimensional data set from $n$ to only $\operatorname{poly}(\mu d\log n)$ weighted points, where $\mu$ is a useful parameter which captures the complexity of compressing the data. Solving (weighted) logistic regression on the sketch gives an $O(\log n)$-approximation to the original problem on the full data set. We also show how to obtain an $O(1)$-approximation with slight modifications. Our sketches are fast, simple, easy to implement, and our experiments demonstrate their practicality.
翻译:为了解答这个问题,我们提出了第一个不为逻辑回归而绘制的数据图。我们的草图可以用一个旋转式数据流的输入宽度时间来计算,并将一个美元维数数据集的大小从一美元减到只有$operatorname{poly}(\mu d\log n)的加权点,其中$\mu$是一个有用的参数,可以捕捉数据压缩的复杂性。在草图上绘制的(加权)后勤图图图图表显示一个$(O\log n)的接近度,整个数据集的原始问题。我们还展示了如何在稍作修改后获得一美元维数数据集。我们的草图非常快捷、简单、易于执行,而且实验也证明了其实用性。