The suffix array is arguably one of the most important data structures in sequence analysis and consequently there is a multitude of suffix sorting algorithms. However, to this date the GSACA algorithm introduced in 2015 is the only known non-recursive linear-time suffix array construction algorithm (SACA). Despite its interesting theoretical properties, there has been little effort in improving the algorithm's subpar real-world performance. There is a super-linear algorithm DSH which relies on the same sorting principle and is faster than DivSufSort, the fastest SACA for over a decade. This paper is concerned with analysing the sorting principle used in GSACA and DSH and exploiting its properties in order to give an optimised linear-time algorithm. Our resulting algorithm is not only significantly faster than GSACA but also outperforms DivSufSort and DSH.
翻译:后缀阵列可以说是序列分析中最重要的数据结构之一,因此有多种后缀排序算法。 但是,迄今为止,2015年引入的GSACA算法是唯一已知的已知非线性线性后缀阵列构建算法(SACA ) 。 尽管它具有有趣的理论属性,但在改进此算法的次等真实世界性性能方面没有做多少努力。 有一种超级线性算法DSH 依赖相同的排序原则,而且比DivSufSort(这是十多年来最快的SACA)更快。 本文涉及分析在GSACA和DSH中使用的排序原则,并利用其属性来提供优化的线性算法。 我们所产生的算法不仅比GSACA快得多,而且比DivSufSort和DSH都快。