多维串流时间序列最佳抽样设计,应用到电网传感器数据</s> (Optimal Sampling Designs for Multi-dimensional Streaming Time Series with Application to Power Grid Sensor Data)

The Internet of Things (IoT) system generates massive high-speed temporally correlated streaming data and is often connected with online inference tasks under computational or energy constraints. Online analysis of these streaming time series data often faces a trade-off between statistical efficiency and computational cost. One important approach to balance this trade-off is sampling, where only a small portion of the sample is selected for the model fitting and update. Motivated by the demands of dynamic relationship analysis of IoT system, we study the data-dependent sample selection and online inference problem for a multi-dimensional streaming time series, aiming to provide low-cost real-time analysis of high-speed power grid electricity consumption data. Inspired by D-optimality criterion in design of experiments, we propose a class of online data reduction methods that achieve an optimal sampling criterion and improve the computational efficiency of the online analysis. We show that the optimal solution amounts to a strategy that is a mixture of Bernoulli sampling and leverage score sampling. The leverage score sampling involves auxiliary estimations that have a computational advantage over recursive least squares updates. Theoretical properties of the auxiliary estimations involved are also discussed. When applied to European power grid consumption data, the proposed leverage score based sampling methods outperform the benchmark sampling method in online estimation and prediction. The general applicability of the sampling-assisted online estimation method is assessed via simulation studies.

翻译：在计算或能源限制下,Tings Internet(IoT)系统产生大量高速时间相关流数据,而且往往与在线测算任务相关。在线分析这些流动时间序列数据往往面临统计效率和计算成本之间的权衡。在统计效率与计算成本之间的权衡上,我们提出的一种重要方法是抽样平衡方法,其中只有一小部分样本被选用于模型的安装和更新。根据对IoT系统动态关系分析的需求,我们研究数据依赖抽样选择和网上模拟推断问题,以多维流时间序列为目的,目的是提供高速电网电力消费数据低成本实时分析。在设计实验时,根据D-最佳标准,我们提出了一类在线数据减少方法,以达到最佳抽样标准,提高在线分析的计算效率。我们表明,最佳解决办法相当于一种战略,即Bernoulli抽样和杠杆分数抽样。杠杆评分抽样涉及辅助性估算,其计算优势是相对于递增性最小平方电网电量更新的计算优势。在进行试验设计时,根据D-优化标准标准,我们提出了一套在线抽样估算方法的理论性质。在进行抽样评估时,还讨论采用基于欧洲的指数的计算方法的计算方法。</s>