序列适用性的有效计算 (Efficient Computation of Sequence Mappability)

In the $(k,m)$-mappability problem, for a given sequence $T$ of length $n$, the goal is to compute a table whose $i$th entry is the number of indices $j \ne i$ such that the length-$m$ substrings of $T$ starting at positions $i$ and $j$ have at most $k$ mismatches. Previous works on this problem focused on heuristics computing a rough approximation of the result or on the case of $k=1$. We present several efficient algorithms for the general case of the problem. Our main result is an algorithm that, for $k=\mathcal{O}(1)$, works in $\mathcal{O}(n)$ space and, with high probability, in $\mathcal{O}(n \cdot \min\{m^k,\log^k n\})$ time. Our algorithm requires a careful adaptation of the $k$-errata trees of Cole et al. [STOC 2004] to avoid multiple counting of pairs of substrings. Our technique can also be applied to solve the all-pairs Hamming distance problem introduced by Crochemore et al. [WABI 2017]. We further develop $\mathcal{O}(n^2)$-time algorithms to compute all $(k,m)$-mappability tables for a fixed $m$ and all $k\in \{0,\ldots,m\}$ or a fixed $k$ and all $m\in\{k,\ldots,n\}$. Finally, we show that, for $k,m = \Theta(\log n)$, the $(k,m)$-mappability problem cannot be solved in strongly subquadratic time unless the Strong Exponential Time Hypothesis fails. This is an improved and extended version of a paper that was presented at SPIRE 2018.

翻译：$( k, m) 和 $( $) 的不匹配问题中, 对于一个给定序列 $( T) 长度 $( 美元), 目标是计算一个表格, 该表格的美元条目是指数数 $j\ ne 美元, 这样, 美元( 美元) 的长度- 百万美元子字符串( 美元) 开始于位置 $( 美元) 和 $( 美元) 错配问题。之前关于该问题的工作侧重于粗略计算结果近似值或美元( 美元) 。我们为问题的一般案例展示了几种有效的算法。我们的主要结果是一个算法, 对于美元( 美元) 美元( 美元) 美元( 美元) 美元( 美元), 美元( 美元) 美元( 美元) 美元( 美元( 美元) 美元( 美元) 。我们的主要结果是, 美元( 美元( 美元) 美元( 美元) ( ) ( 美元) ( 美元( 美元) ( 美元) ( 美元) ( 美元) ( 美元) ( 美元) ( 美元) ( 美元) ( 美元( 美元) ( 美元) ( 美元) ( 美元) ( 美元) ( ) ( ) ( ) ( 美元) ( ) ( ) ( ) ( ) ( ) ( ) ( 美元) ( ) ( ) ( ) ( 美元) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) (美元) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) (美元) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) (