In this paper we develop an online statistical inference approach for high-dimensional generalized linear models with streaming data for real-time estimation and inference. We propose an online debiased lasso (ODL) method to accommodate the special structure of streaming data. ODL differs from offline debiased lasso in two important aspects. First, in computing the estimate at the current stage, it only uses summary statistics of the historical data. Second, in addition to debiasing an online lasso estimator, ODL corrects an approximation error term arising from nonlinear online updating with streaming data. We show that the proposed online debiased estimators for the GLMs are consistent and asymptotically normal. This result provides a theoretical basis for carrying out real-time interim statistical inference with streaming data. Extensive numerical experiments are conducted to evaluate the performance of the proposed ODL method. These experiments demonstrate the effectiveness of our algorithm and support the theoretical results. A streaming dataset from the National Automotive Sampling System-Crashworthiness Data System is analyzed to illustrate the application of the proposed method.
翻译:在本文中,我们为高维通用线性模型开发了在线统计推断方法,并提供了实时估算和推断数据流流数据流数据流动数据流动数据流动数据。我们提议了在线下降拉索(ODL)方法,以适应流数据的特殊结构。ODL在两个重要方面与离线下降拉索(ODL)不同。首先,在计算当前阶段的估计数时,它只使用历史数据的汇总统计数据。第二,除了减少对在线拉索测算器的偏差外,ODL还纠正了非线性在线更新流数据产生的近似误差。我们表明,拟议的GLMS在线降低测算器(ODL)是一致的,也是零点正常的。这一结果为实时临时统计对流数据进行推断提供了理论依据。进行了广泛的数字实验,以评价拟议的ODL方法的性能。这些实验证明了我们的算法的有效性并支持理论结果。我们分析了从国家汽车取样系统崩溃数据系统流出的数据集,以说明拟议的方法的应用情况。