通过快速矩阵乘法匹配的 Elastic- Degenerate 梯度字符串 (Elastic-Degenerate String Matching via Fast Matrix Multiplication)

An elastic-degenerate (ED) string is a sequence of $n$ sets of strings of total length $N$, which was recently proposed to model a set of similar sequences. The ED string matching (EDSM) problem is to find all occurrences of a pattern of length $m$ in an ED text. An $O(nm^{1.5}\sqrt{\log m}+N)$-time algorithm for EDSM is known [Aoyama et al., CPM 2018]. The standard assumption in the prior work on this question is that $N$ is substantially larger than both $n$ and $m$, and thus we would like to have a linear dependency on the former. Under this assumption, the natural open problem is whether we can decrease the 1.5 exponent in the time complexity, similarly as in the related (but, to the best of our knowledge, not equivalent) word break problem [Backurs and Indyk, FOCS 2016]. Our starting point is a conditional lower bound for EDSM. We use the popular combinatorial Boolean Matrix Multiplication (BMM) conjecture stating that there is no truly subcubic combinatorial algorithm for BMM [Abboud and Williams, FOCS 2014]. By designing an appropriate reduction we show that a combinatorial algorithm solving the EDSM problem in $O(nm^{1.5-e}+N)$ time, for any $e>0$, refutes this conjecture. Our reduction should be understood as an indication that decreasing the exponent requires fast matrix multiplication. String periodicity and fast Fourier transform are two standard tools in string algorithms. Our main technical contribution is that we successfully combine these tools with fast matrix multiplication to design a non-combinatorial $\tilde{O}(nm^{\omega-1}+N)$-time algorithm for EDSM, where $\omega$ denotes the matrix multiplication exponent. To the best of our knowledge, we are the first to combine these tools. In particular, using the fact that $\omega<2.373$ [Le Gall, ISSAC 2014; Williams, STOC 2012], we obtain an $O(nm^{1.373}+N)$-time algorithm for EDSM.

翻译：缩略图- degenerate (ED) 字符串是 $1.3 的序列, 2014 年总长度为 $; 美元; 美元; 美元; 美元; 最近提议模拟一组类似序列。 ED 字符串匹配(EDS) 的问题是, 在 ED 文本中找到所有出现长度为百万美元的模式。已知 EDS 的 $( m\ 1.5 ⁇ sqrt =m ⁇ N) 时间算法 [Aoyama 和al., CPM 2018] 。之前关于该问题的工作标准假设是 $ 大大高于 $ ; 美元; 美元; 美元; 美元; 美元; 美元; 美元; 美元; 美元; 美元; 美元; 美元; 最近提议模拟的字符串。因此我们希望对前者有线性依赖。在这个时间复杂性中, 我们能否降低1.5, 与我们最相关的( 但是, ) 美元- IMFOC 的数据解算算算出我们最起码的。