Suffix trees are an important data structure at the core of optimal solutions to many fundamental string problems, such as exact pattern matching, longest common substring, matching statistics, and longest repeated substring. Recent lines of research focused on extending some of these problems to vertex-labeled graphs, although using ad-hoc approaches which in some cases do not generalize to all input graphs. In the absence of a ubiquitous tool like the suffix tree for labeled graphs, we introduce the labeled direct product of two graphs as a general tool for obtaining optimal algorithms: we obtain conceptually simpler algorithms for the quadratic problems of string matching (SMLG) and longest common substring (LCSP) in labeled graphs. Our algorithms are also more efficient, since they run in time linear in the size of the labeled product graph, which may be smaller than quadratic for some inputs, and their run-time is predictable, because the size of the labeled direct product graph can be precomputed efficiently. We also solve LCSP on graphs containing cycles, which was left as an open problem by Shimohira et al. in 2011. To show the power of the labeled product graph, we also apply it to solve the matching statistics (MSP) and the longest repeated string (LRSP) problems in labeled graphs. Moreover, we show that our (worst-case quadratic) algorithms are also optimal, conditioned on the Orthogonal Vectors Hypothesis. Finally, we complete the complexity picture around LRSP by studying it on undirected graphs.
翻译:suffix 树是许多基本字符串问题最佳解决方案核心的重要数据结构, 如精确模式匹配、 最长常见子字符串、 匹配统计数据和最长重复的子字符串。 最近的研究线侧重于将其中一些问题扩大到顶端贴标签的图形, 尽管使用 ad- hoc 方法, 这些方法在某些情况下并不概括到所有输入图中。 在没有像 后缀树这样的无处不在的工具 标签图时, 我们引入两个图表的贴上标签的直接复杂度产品, 作为获取最佳算法的一般工具 : 我们在标签的图形中, 获得字符串匹配( SMLG) 和最长的子字符串问题( LCSP ) 的概念上更简单的算法。 我们的算法也更有效率, 因为它们在标签产品图的大小中运行时间线性, 其运行时间可以预测, 因为标签的直径直产品图表的大小可以被快速转换为 。 我们还在含有字符串匹配的平面算法的平面算法上, 我们也可以在图表上解解解的 LCSP,, 也通过Sim- hill 来显示我们的平面图的图 。