In this work, we consider the problem of pattern matching under the dynamic time warping (DTW) distance motivated by potential applications in the analysis of biological data produced by the third generation sequencing. To measure the DTW distance between two strings, one must "warp" them, that is, double some letters in the strings to obtain two equal-lengths strings, and then sum the distances between the letters in the corresponding positions. When the distances between letters are integers, we show that for a pattern P with m runs and a text T with n runs: 1. There is an O(m + n)-time algorithm that computes all locations where the DTW distance from P to T is at most 1; 2. There is an O(kmn)-time algorithm that computes all locations where the DTW distance from P to T is at most k. As a corollary of the second result, we also derive an approximation algorithm for general metrics on the alphabet.
翻译:在这项工作中,我们考虑了在动态时间扭曲(DTW)距离下匹配模式的问题。在分析第三代测序产生的生物数据时,由于潜在应用的动机,在动态时间扭曲(DTW)距离下进行匹配。为了测量DTW在两个字符串之间的距离,必须“扭动”两个字符串,即在字符串中双倍使用一些字母以获得两个等长字符串,然后在相应的位置上计算字母之间的距离。当字母之间的距离是整数时,我们发现,对于带有 m 运行的P 模式和有 n 运行的文本 T 模式,我们可以看到: 1. 有一种O(m + n) 时间算法,计算了DT 从 P 到 T 最多为 1 的所有地点; 2. 有一种O(kmn) 时间算法,计算了所有从 P 到 T 到 T 最多为 k 的 DT 的DW 距离的所有地点。作为第二个结果的必然结果,我们还为字母上的通用指标得出了一种近值算法。