We provide a deterministic space-efficient algorithm for estimating ridge regression. For $n$ data points with $d$ features and a large enough regularization parameter, we provide a solution within $\varepsilon$ L$_2$ error using only $O(d/\varepsilon)$ space. This is the first $o(d^2)$ space deterministic streaming algorithm with guaranteed solution error and risk bound for this classic problem. The algorithm sketches the covariance matrix by variants of Frequent Directions, which implies it can operate in insertion-only streams and a variety of distributed data settings. In comparisons to randomized sketching algorithms on synthetic and real-world datasets, our algorithm has less empirical error using less space and similar time.
翻译:我们为估计山脊回归提供了一种确定性的空间高效算法。对于具有美元特性和足够大规范参数的一美元数据点,我们仅使用O(d/\varepsilon)美元空间,在$varepsilon $2美元错误的范围内提供一种解决办法。这是第一个具有保障性解决方案错误和风险的确定性空间流算法,与这一典型问题的保证性解决方案错误和风险挂钩。这一算法用“常见方向”变量描绘了共变矩阵,这意味着它可以在只插入的流和各种分布式数据设置中运作。在与合成和现实世界数据集随机的草图算法比较中,我们的算法使用较少空间和类似时间的经验错误。