We propose an online debiased lasso (ODL) method for statistical inference in high-dimensional linear models with streaming data. The proposed ODL consists of an efficient computational algorithm for streaming data and approximately normal estimators for the regression coefficients. Its implementation only requires the availability of the current data batch in the data stream and sufficient statistics of the historical data at each stage of the analysis. A dynamic procedure is developed to select and update the tuning parameters upon the arrival of each new data batch so that we can adjust the amount of regularization adaptively along the data stream. The asymptotic normality of the ODL estimator is established under the conditions similar to those in an offline setting and mild conditions on the size of data batches in the stream, which provides theoretical justification for the proposed online statistical inference procedure. We conduct extensive numerical experiments to evaluate the performance of ODL. These experiments demonstrate the effectiveness of our algorithm and support the theoretical results. An air quality dataset and an index fund dataset from Hong Kong Stock Exchange are analyzed to illustrate the application of the proposed method.
翻译:我们建议采用在线去偏差的Lasso(ODL)方法,在具有流数据的高维线性模型中进行统计推断,拟议的ODL包括数据流的高效计算算法和回归系数的大致正常估计值,其实施仅需要提供数据流中的当前数据批量和在分析的每个阶段对历史数据进行充分统计。我们开发了一个动态程序,以便在每批新数据到达时选择和更新调试参数,以便我们能够在数据流中根据适应性调整正规化的数量。ODL估计值的无症状正常性是在类似于离线设置和关于流中数据批量的温和条件的条件下建立的,为拟议的在线统计推理程序提供了理论依据。我们进行了广泛的数字实验,以评价ODL的性能。这些实验证明了我们的算法的有效性并支持理论结果。我们从香港股票交易所得到的空气质量数据集和指数基金数据集进行了分析,以说明拟议方法的应用情况。