Sparse regression has been a popular approach to perform variable selection and enhance the prediction accuracy and interpretability of the resulting statistical model. Existing approaches focus on offline regularized regression, while the online scenario has rarely been studied. In this paper, we propose a novel online sparse linear regression framework for analyzing streaming data when data points arrive sequentially. Our proposed method is memory efficient and requires less stringent restricted strong convexity assumptions. Theoretically, we show that with a properly chosen regularization parameter, the $\ell_2$-norm statistical error of our estimator diminishes to zero in the optimal order of $\tilde{O}({\sqrt{s/t}})$, where $s$ is the sparsity level, $t$ is the streaming sample size, and $\tilde{O}(\cdot)$ hides logarithmic terms. Numerical experiments demonstrate the practical efficiency of our algorithm.
翻译:微缩回归是一个流行的方法,用于进行变量选择,并提高由此得出的统计模型的预测准确性和可解释性。 现有方法侧重于离线常规回归, 而在线假设则很少研究。 在本文中, 我们提出一个新的在线稀疏线性回归框架, 用于在数据点相继到达时分析流数据。 我们提议的方法是记忆效率, 要求严格得多的强固共性假设。 从理论上讲, 我们显示, 通过正确选择的正规化参数, 我们的估算值的统计错误 $\ ell_ 2$- norm 统计错误, 以 $\ tdelde{O} ( ksqrt{s/ t})$ 为最佳顺序, $s is spority legality, $t$t is the tracing expression size, $\\ t$\ tde{O} (\cdot) 隐藏对数术语。 数字实验显示我们算法的实际效率 。