For nonparametric regression in the streaming setting, where data constantly flow in and require real-time analysis, a main challenge is that data are cleared from the computer system once processed due to limited computer memory and storage. We tackle the challenge by proposing a novel one-pass estimator based on penalized orthogonal basis expansions and developing a general framework to study the interplay between statistical efficiency and memory consumption of estimators. We show that, the proposed estimator is statistically optimal under memory constraint, and has asymptotically minimal memory footprints among all one-pass estimators of the same estimation quality. Numerical studies demonstrate that the proposed one-pass estimator is nearly as efficient as its non-streaming counterpart that has access to all historical data.
翻译:对于数据不断流动并需要实时分析的流体环境的非对称回归,一个主要挑战是,由于计算机内存和储存有限,一旦处理完毕,计算机系统的数据就会被清除;我们应对这一挑战,办法是在惩罚性正反扩展的基础上提出一个新的一次性估计数字,并制定一个总体框架,以研究统计效率和估计数字的记忆消耗之间的相互作用;我们表明,拟议的估计数字在记忆力限制下是统计上最理想的,而且所有同一估计质量的单向估计数字的单向估计数字中,其记忆足迹在时间上是微不足道的。 数字研究表明,拟议的单向估计数字几乎与能够获取所有历史数据的非流动对应数据一样有效。