In the research field of big data, one of important issues is how to recover the sequentially changing sets of true features when the data sets arrive sequentially. The paper presents a general framework for online updating variable selection and parameter estimation in generalized linear models with streaming datasets. This is a type of online updating penalty likelihoods with differentiable or non-differentiable penalty function. The online updating coordinate descent algorithm is proposed to solve the online updating optimization problem. Moreover, a tuning parameter selection is suggested in an online updating way. The selection and estimation consistencies, and the oracle property are established, theoretically. Our methods are further examined and illustrated by various numerical examples from both simulation experiments and a real data analysis.
翻译:在海量数据研究领域,一个重要问题是当数据集按顺序到达时,如何恢复按顺序变化的一套真实特征。本文件为在线更新带有流数据集的通用线性模型中的变量选择和参数估计提供了一个总体框架。这是一种在线更新罚款可能性的在线更新类型,具有不同或无差别的罚款功能。在线更新协调下限算法是为了解决在线更新优化问题。此外,还建议以在线更新的方式调整参数选择。选择和估算是分数的,理论上建立了甲骨文属性。我们的方法通过模拟实验和真实数据分析中的各种数字实例得到进一步审查和说明。