We study edit distance computation with preprocessing: the preprocessing algorithm acts on each string separately, and then the query algorithm takes as input the two preprocessed strings. This model is inspired by scenarios where we would like to compute edit distance between many pairs in the same pool of strings. Our results include: Permutation-LCS: If the LCS between two permutations has length $n-k$, we can compute it \textit{ exactly} with $O(n \log(n))$ preprocessing and $O(k \log(n))$ query time. Small edit distance: For general strings, if their edit distance is at most $k$, we can compute it \textit{ exactly} with $O(n\log(n))$ preprocessing and $O(k^2 \log(n))$ query time. Approximate edit distance: For the most general input, we can approximate the edit distance to within factor $(7+o(1))$ with preprocessing time $\tilde{O}(n^2)$ and query time $\tilde{O}(n^{1.5+o(1)})$. All of these results significantly improve over the state of the art in edit distance computation without preprocessing. Interestingly, by combining ideas from our algorithms with preprocessing, we provide new improved results for approximating edit distance without preprocessing in subquadratic time.
翻译:我们研究用预处理来编辑距离计算: 每个字符串的预处理算法行为是分开的, 然后查询算法将两个预处理字符串作为输入。 这个模型的灵感来自我们想要在同一字符串库中计算许多对配对间距离的假想。 我们的结果包括: Permutation- LCS: 如果两个配对之间的 LCS 长度为 $n- k$, 我们可以精确地计算它\ textit{} 与$O( n\log( log( log) ) 预处理美元和$O( k\ log( log( n) ) 查询时间 。 小型编辑时间 : 对于一般字符串, 如果它们的编辑距离最多为 $, 我们可以用$\ textitriitriit{ 准确计算它 $( 美元) 和$O( k% 2\ log( ) ) 查询时间 。 最接近的编辑距离: 对于最一般的输入, 我们可以用预处理时间 $( 7+) $( o) $( tildededede) {O) $( n2) 和 查询时间 时间 时间将我们所有的远程计算结果合并成 。