The wide deployment of IoT sensors has enabled the collection of very big time series across different domains, from which advanced analytics can be performed to find unknown relationships, most importantly the correlations between them. However, current approaches for correlation search on time series are limited to only a single temporal scale and simple types of relations, and cannot handle noise effectively. This paper presents the integrated SYnchronous COrrelation Search (iSYCOS) framework to find multi-scale correlations in big time series. Specifically, iSYCOS integrates top-down and bottom-up approaches into a single auto-configured framework capable of efficiently extracting complex window-based correlations from big time series using mutual information (MI). Moreover, iSYCOS includes a novel MI-based theory to identify noise in the data, and is used to perform pruning to improve iSYCOS performance. Besides, we design a distributed version of iSYCOS that can scale out in a Spark cluster to handle big time series. Our extensive experimental evaluation on synthetic and real-world datasets shows that iSYCOS can auto-configure on a given dataset to find complex multi-scale correlations. The pruning and optimisations can improve iSYCOS performance up to an order of magnitude, and the distributed iSYCOS can scale out linearly on a computing cluster.
翻译:IoT 传感器的广泛部署使得能够在不同领域收集非常大的时间序列,从中可以进行先进的分析,找到未知的关系,其中最重要的是它们之间的相互关系。然而,目前对时间序列的关联搜索方法仅限于单一的时间尺度和简单的关系类型,无法有效地处理噪音。本文介绍了综合的同步凝固凝固关系搜索(iSYCOS)框架,以在大型时间序列中找到多尺度的关联。具体地说,iSYCOS将自上而下和自下而上的方法整合成一个单一的自动配置框架,能够利用相互的信息(MI)从大型时间序列中高效提取复杂的基于窗口的相互关系。此外,iSYCOSS包含一个基于MI的新理论,用以识别数据中的噪音,并用来进行运行,以提高 ISYCOS 的性能。此外,我们设计一个分布式的 ISYCOS 版本,可以在Syal和现实世界数据集上进行广泛的实验性能评估, 显示iSYCOS 能够从一个自动配置和直线式的SIS 级化数据系统到一个复杂规模的SIS IMSMA 级的SMA 级的系统,可以找到一个自动和直径级的SIS 级的SIS 级的Syal-Syal 级级的Syal-imcal-sal-sal-sal-sal-smal-sal-setal-sal atoal-smal-sal-smal-smal-sal-setal-sal-smal-smal-smal-smal-setmental