A motif intuitively is a short time series that repeats itself approximately the same within a larger time series. Such motifs often represent concealed structures, such as heart beats in an ECG recording, or sleep spindles in EEG sleep data. Motif discovery (MD) is the task of finding such motifs in a given input series. As there are varying definitions of what exactly a motif is, a number of algorithms exist. As central parameters they all take the length l of the motif and the maximal distance r between the motif's occurrences. In practice, however, suitable values for r are very hard to determine upfront, and the found motifs show a high variability. Setting the wrong input value will result in a motif that is not distinguishable from noise. Accordingly, finding an interesting motif with these methods requires extensive trial-and-error. We present a different approach to the MD problem. We define k-Motiflets as the set of exactly k occurrences of a motif of length l, whose maximum pairwise distance is minimal. This turns the MD problem upside-down: Our central parameter is not the distance threshold r, but the desired size k of a motif set, which we show is considerably more intuitive and easier to set. Based on this definition, we present exact and approximate algorithms for finding k-Motiflets and analyze their complexity. To further ease the use of our method, we describe extensions to automatically determine the right/suitable values for its input parameters. Thus, for the first time, extracting meaningful motif sets without any a-priori knowledge becomes feasible. By evaluating real-world use cases and comparison to 4 state-of-the-art MD algorithms, we show that our proposed algorithm is (a) quantitatively superior, finding larger motif sets at higher similarity, (b) qualitatively better, leading to clearer and easier to interpret motifs, and (c) has the lowest runtime.
翻译:motif 直观地说是一个较短的时间序列, 它在更大的时间序列中重复了它本身大致相同。 这种元素往往代表隐藏的结构, 如 EG 记录中的心脏跳动, 或 EEEG 睡眠数据中的睡眠螺旋。 Motif 发现( MD) 的任务是在给定的输入序列中找到这样的运动形体。 由于对什么是“ motif ” 有不同的定义, 许多算法已经存在。 由于中央参数都使用 motif 的长度和 mostal 距离。 然而, 在实践中, r 的合适值往往代表隐藏的结构, 例如 ECG 记录中的心脏跳动, 或 EEG 睡眠数据中的睡眠螺旋曲。 设置错误输入值将产生一个无法与噪音区分的图案。 因此, 找到这些方法的有趣图案需要广泛的试算。 我们对MD 问题提出了一种不同的方法。 我们定义 k- Motif t, 我们定义 ktif 用于任何 ktif 时间序列的设置 。 我们定义 ktitudeal deal deal deal deal deal deal deal deal deal deal.