Matching statistics were introduced to solve the approximate string matching problem, which is a recurrent subroutine in bioinformatics applications. In 2010, Ohlebusch et al. [SPIRE 2010] proposed a time and space efficient algorithm for computing matching statistics which relies on some components of a compressed suffix tree - notably, the longest common prefix (LCP) array. In this paper, we show how their algorithm can be generalized from strings to Wheeler deterministic finite automata. Most importantly, we introduce a notion of LCP array for Wheeler automata, thus establishing a first clear step towards extending (compressed) suffix tree functionalities to labeled graphs.
翻译:引入匹配统计数据是为了解决大致的字符串匹配问题,这是生物信息学应用中经常出现的一个子例程。 2010年,Ohlebusch 等人[SPIRE 2010] 提出了一个时间和空间高效算法,用于计算匹配统计数据,该算法依赖于压缩后缀树的某些组成部分 — — 特别是最长的常见前缀(LCP)阵列。在本文中,我们展示了他们的算法如何从字符串到惠勒确定性有限自动马塔的通用。最重要的是,我们引入了惠勒自动马塔LCP阵列的概念,从而确立了将(压缩的)后缀树功能扩展至标签图形的第一个明确步骤。