In time series analysis, when fitting an autoregressive model, one must solve a Toeplitz ordinary least squares problem numerous times to find an appropriate model, which can severely affect computational times with large data sets. Two recent algorithms (LSAR and Repeated Halving) have applied randomized numerical linear algebra (RandNLA) techniques to fitting an autoregressive model to big time-series data. We investigate and compare the quality of these two approximation algorithms on large-scale synthetic and real-world data. While both algorithms display comparable results for synthetic datasets, the LSAR algorithm appears to be more robust when applied to real-world time series data. We conclude that RandNLA is effective in the context of big-data time series.
翻译:在时间序列分析中,当安装自动递减模型时,人们必须多次解决托普利茨普通最小方块问题,找到一个合适的模型,这可能会对使用大型数据集的计算时间产生严重影响。最近的两个算法(LSAR和重复的halving)应用了随机数字线性代数(RandNLA)技术,将自动递减模型与大时间序列数据相匹配。我们调查并比较了这两种大规模合成和真实世界数据近似算法的质量。虽然这两种算法都显示了合成数据集的可比结果,但当应用到实时时间序列数据时,LSAR算法似乎更加有力。我们的结论是,RandNLA在大数据时间序列中是有效的。