We revisit the task of computing the edit distance in sublinear time. In the $(k,K)$-gap edit distance problem the task is to distinguish whether the edit distance of two strings is at most $k$ or at least $K$. It has been established by Goldenberg, Krauthgamer and Saha (FOCS '19), with improvements by Kociumaka and Saha (FOCS '20), that the $(k,k^2)$-gap problem can be solved in time $\widetilde O(n/k+\operatorname{poly}(k))$. One of the most natural questions in this line of research is whether the $(k,k^2)$-gap is best-possible for the running time $\widetilde O(n/k+\operatorname{poly}(k))$. In this work we answer this question by significantly improving the gap. Specifically, we show that in time $O(n/k+\operatorname{poly}(k))$ we can even solve the $(k,k^{1+o(1)})$-gap problem. This is the first algorithm that breaks the $(k,k^2)$-gap in this running time. Our algorithm is almost optimal in the following sense: In the low distance regime ($k\le n^{0.19}$) our running time becomes $O(n/k)$, which matches a known $n/k^{1+o(1)}$ lower bound for the $(k,k^{1+o(1)})$-gap problem up to lower order factors. Our result also reveals a surprising similarity of Hamming distance and edit distance in the low distance regime: For both, the $(k,k^{1+o(1)})$-gap problem has time complexity $n/k^{1\pm o(1)}$ for small $k$. In contrast to previous work, which employed a subsampled variant of the Landau-Vishkin algorithm, we instead build upon the algorithm of Andoni, Krauthgamer and Onak (FOCS '10). We first simplify their approach and then show how to to effectively prune their computation tree in order to obtain a sublinear-time algorithm in the given time bound. Towards that, we use a variety of structural insights on the (local and global) patterns that can emerge during this process and design appropriate property testers to effectively detect these patterns.
翻译:我们重新审视了在亚线性时间中计算编辑距离的任务。 在 $( k, k) $- gap 编辑距离问题中, 任务在于区分两个字符的编辑距离是最多为 $k$, 还是至少为 $。 由 Goldenberg, Krauthgamer 和 Saha ( FOCS'19) 建立, Kocimaka 和 Saha ( FOCS '20) 改进了 美元( k, k2) 美元- gap 问题可以在时间里解决 $( blofilte O) (n) (k, koperator name{poly} (k) 。 在这个研究线里, 最自然的问题之一是 $( k, k) 美元 或至少为 美元 。 在运行过程中, 美元 正在显示一个时间里 。</s>