We study the problem of matching a string in a labeled graph. Previous research has shown that unless the Orthogonal Vectors Hypothesis (OVH) is false, one cannot solve this problem in strongly sub-quadratic time, nor index the graph in polynomial time to answer queries efficiently (Equi et al. ICALP 2019, SOFSEM 2021). These conditional lower-bounds cover even deterministic graphs with binary alphabet, but there naturally exist also graph classes that are easy to index: E.g. Wheeler graphs (Gagie et al. Theor. Comp. Sci. 2017) cover graphs admitting a Burrows-Wheeler transform -based indexing scheme. However, it is NP-complete to recognize if a graph is a Wheeler graph (Gibney, Thankachan, ESA 2019). We propose an approach to alleviate the construction bottleneck of Wheeler graphs. Rather than starting from an arbitrary graph, we study graphs induced from multiple sequence alignments (MSAs). Elastic degenerate strings (Bernadini et al. SPIRE 2017, ICALP 2019) can be seen as such graphs, and we introduce here their generalization: elastic founder graphs. We first prove that even such induced graphs are hard to index under OVH. Then we introduce two subclasses, repeat-free and semi-repeat-free graphs, that are easy to index. We give a linear time algorithm to construct a repeat-free non-elastic founder graph from a gapless MSA, and (parameterized) near-linear time algorithms to construct semi-repeat-free (repeat-free, respectively) elastic founder graphs from general MSAs. Finally, we show that repeat-free elastic founder graphs admit a reduction to Wheeler graphs in polynomial time.
翻译:我们研究了在标签图中匹配字符串的问题。 先前的研究显示, 除非 Orthogonal 矢量假假( OVH) 存在错误, 否则无法在强烈的二次二次曲线时间里解决这个问题, 也无法在多式时间里将图形索引化以有效解答查询( Equi et al. CricP 2019, SOFSEM 2021) 。 这些条件性较低的图表甚至覆盖了二进制字母的确定性图表, 但自然也存在容易索引的直径类 : E. greer 图形( Gagie et al. Theor. Comp. Comp. Sci. 2017) 覆盖了承认 Burrows- Wheeler 变换代指数化的图表。 然而, 如果一个图形是Wheeler 图形( Gibney, Salchancan, SEA 2019), 我们建议一种减轻轮值图中构造瓶颈的方法。 我们从一个任意图表开始, 我们研究的是从多个序列对等直径直径直径的直径直径直径直径直径直径的直径直径直数( 20A- dealalalalalalalalalalalalalalalalalalalalal- lialalalalal) 到一个直径直径直系, alals。