Dynamic Time Warping (DTW) is a widely used similarity measure for comparing strings that encode time series data, with applications to areas including bioinformatics, signature verification, and speech recognition. The standard dynamic-programming algorithm for DTW takes $O(n^2)$ time, and there are conditional lower bounds showing that no algorithm can do substantially better. In many applications, however, the strings $x$ and $y$ may contain long runs of repeated letters, meaning that they can be compressed using run-length encoding. A natural question is whether the DTW-distance between these compressed strings can be computed efficiently in terms of the lengths $k$ and $\ell$ of the compressed strings. Recent work has shown how to achieve $O(k\ell^2 + \ell k^2)$ time, leaving open the question of whether a near-quadratic $\tilde{O}(k\ell)$-time algorithm might exist. We show that, if a small approximation loss is permitted, then a near-quadratic time algorithm is indeed possible: our algorithm computes a $(1 + \epsilon)$-approximation for $DTW(x, y)$ in $\tilde{O}(k\ell / \epsilon^3)$ time, where $k$ and $\ell$ are the number of runs in $x$ and $y$. Our algorithm allows for $DTW$ to be computed over any metric space $(\Sigma, \delta)$ in which distances are $O(log(n))$-bit integers. Surprisingly, the algorithm also works even if $\delta$ does not induce a metric space on $\Sigma$ (e.g., $\delta$ need not satisfy the triangle inequality).
翻译:动态时间扭曲( DTW) 是用来比较时间序列数据( 包括生物信息、 签名验证和语音识别) 的字符串的广泛使用相似度量。 DTW 的标准动态程序程序算法需要O (n2) 美元的时间, 并且有有条件的下限, 这表明没有算法可以做的更好。 然而, 在许多应用程序中, 字符x$ 和 $ 可能包含长长的重复字母, 这意味着它们可以使用运行长的编码压缩。 一个自然的问题是, 这些压缩的字符串之间的 DTW 距离能否以长度( 美元) 和压缩的字符串的 美元来有效计算 。 最近的工作表明, 如何实现 $( kell2 + k2) 和 美元 有条件的下限 。 在许多应用程序中, 字符x $ (k) 美元 和 美元 美元 的 时间算法可能存在 。 我们显示, 如果允许任何小的近似损失, 那么这些压缩字符串之间的时间算算法是 美元 。