Traditional statistical methods are faced with new challenges due to streaming data. The major challenge is the rapidly growing volume and velocity of data, which makes storing such huge datasets in memory impossible. The paper presents an online inference framework for regression parameters in high-dimensional semiparametric single-index models with unknown link functions. The proposed online procedure updates only the current data batch and summary statistics of historical data instead of re-accessing the entire raw data set. At the same time, we do not need to estimate the unknown link function, which is a highly challenging task. In addition, a generalized convex loss function is used in the proposed inference procedure. To illustrate the proposed method, we use the Huber loss function and the logistic regression model's negative log-likelihood. In this study, the asymptotic normality of the proposed online debiased Lasso estimators and the bounds of the proposed online Lasso estimators are investigated. To evaluate the performance of the proposed method, extensive simulation studies have been conducted. We provide applications to Nasdaq stock prices and financial distress datasets.
翻译:传统统计方法因数据流而面临新的挑战。主要的挑战在于数据量和速度的迅速增长,使得无法在记忆中储存如此庞大的数据集。本文为具有未知链接功能的高维半参数单指数模型的回归参数提供了一个在线推论框架。拟议的在线程序仅更新历史数据的当前数据批量和摘要统计,而不是重新检索整个原始数据集。与此同时,我们不需要估计未知的链接功能,这是一项极具挑战性的任务。此外,在拟议的推论程序中,还使用了普遍的 convex损失功能。为了说明拟议的方法,我们使用了“枢纽损失功能”和“物流回归模型”的负日志相似性。在本研究中,对拟议的在线脱偏差激光索估计器和拟议在线激光索估计器的界限进行了非典型的正常性调查。为了评估拟议方法的绩效,我们进行了广泛的模拟研究。我们向Nasdaq股票价格和金融危难数据集提供了应用。