We apply methods from randomized numerical linear algebra (RandNLA) to develop improved algorithms for the analysis of large-scale time series data. We first develop a new fast algorithm to estimate the leverage scores of an autoregressive (AR) model in big data regimes. We show that the accuracy of approximations lies within $(1+\bigO{\varepsilon})$ of the true leverage scores with high probability. These theoretical results are subsequently exploited to develop an efficient algorithm, called LSAR, for fitting an appropriate AR model to big time series data. Our proposed algorithm is guaranteed, with high probability, to find the maximum likelihood estimates of the parameters of the underlying true AR model and has a worst case running time that significantly improves those of the state-of-the-art alternatives in big data regimes. Empirical results on large-scale synthetic as well as real data highly support the theoretical results and reveal the efficacy of this new approach.
翻译:我们运用随机数字线性代数(RandNLA)的方法来开发改进的算法,用于分析大型时间序列数据。我们首先开发新的快速算法,以估计在大数据系统中自动递减模型的杠杆分数。我们显示近似值的准确性在真实杠杆分数的$(1 ⁇ bigO\varepsilon}$(1 ⁇ bigO\varepsilon})范围内,其概率很高。这些理论结果随后被用来开发一种有效的算法,称为LSAR,以将适当的AR模型与大时间序列数据相匹配。我们提议的算法保证以很高的概率找到对真实AR模型参数的最大可能性估计,并且有一个最坏的运行时间,大大改进了大数据系统中最先进的替代方法。大型合成和真实数据的实验结果高度支持理论结果并揭示了这一新方法的功效。