Complex systems, such as airplanes, cars, or financial markets, produce multivariate time series data consisting of system observations over a period of time. Such data can be interpreted as a sequence of segments, where each segment is associated with a certain state of the system. An important problem in this domain is to identify repeated sequences of states, known as motifs. Such motifs correspond to complex behaviors that capture common sequences of state transitions. For example, a motif of "making a turn" might manifest in sensor data as a sequence of states: slowing down, turning the wheel, and then speeding back up. However, discovering these motifs is challenging, because the individual states are unknown and need to be learned from the noisy time series. Simultaneously, the time series also needs to be precisely segmented and each segment needs to be associated with a state. Here we develop context-aware segmentation and clustering (CASC), a method for discovering common motifs in time series data. We formulate the problem of motif discovery as a large optimization problem, which we then solve using a greedy alternating minimization-based approach. CASC performs well in the presence of noise in the input data and is scalable to very large datasets. Furthermore, CASC leverages common motifs to more robustly segment the time series and assign segments to states. Experiments on synthetic data show that CASC outperforms state-of-the-art baselines by up to 38.2%, and two case studies demonstrate how our approach discovers insightful motifs in real-world time series data.
翻译:复杂的系统,如飞机、汽车或金融市场,产生由一段时间的系统观测组成的多变时间序列数据。这些数据可以被解释为一个段段的序列,每个段与系统的某些状态相关。这个领域的一个重要问题是确定国家重复的序列,称为motifs。这样的图示与包含国家转型共同序列的复杂行为相对应。例如,“转换”的图案可能表现在传感器数据中,作为一个国家序列:放慢速度,转动方向,然后加快速度。然而,发现这些图案具有挑战性,因为单个状态是未知的,需要从噪音的时间序列中学习。同时,时间序列也需要精确分割,每个部分也需要与状态相联系。在这里,我们开发了环境觉识分解和组合(CASC),这是在时间序列中发现共同的模型。我们用一个巨大的模型发现数据序列,然后我们用一个贪婪的易变换的 CASARC 数据序列来解析数据。