Suffix arrays and LCP arrays are one of the most fundamental data structures widely used for various kinds of string processing. We consider two problems for a read-only string of length $N$ over an integer alphabet $[1, \dots, \sigma]$ for $1 \leq \sigma \leq N$, the string contains $\sigma$ distinct characters, the construction of the suffix array, and a simultaneous construction of both the suffix array and LCP array. For the word RAM model, we propose algorithms to solve both of the problems in $O(N)$ time by using $O(1)$ extra words, which are optimal in time and space. Extra words means the required space except for the space of the input string and output suffix array and LCP array. Our contribution improves the previous most efficient algorithms, $O(N)$ time using $\sigma+O(1)$ extra words by [Nong, TOIS 2013] and $O(N \log N)$ time using $O(1)$ extra words by [Franceschini and Muthukrishnan, ICALP 2007], for constructing suffix arrays, and it improves the previous most efficient solution that runs in $O(N)$ time using $\sigma + O(1)$ extra words for constructing both suffix arrays and LCP arrays through a combination of [Nong, TOIS 2013] and [Manzini, SWAT 2004].
翻译:后缀阵列和 LCP 阵列是广泛用于各种字符串处理的最基本数据结构之一。 我们考虑两个问题, 在一个整数字母 $[11,\dots,\sgma]$$, $1\leq\sgma\leq\leqN$, 字符串包含$\sigma$ 不同字符, 后缀阵列的构建, 以及同时构建 后缀阵列和 LCP 阵列。 对于单词 RAM 模式, 我们建议用一个只读的长度字符串解决两个问题, $( N) $( N) 时间, 使用美元(1美元) 额外的单字, 在时间和空间上最优化。 额外单词意味着所需的空间, 除了输入字符串和输出后缀阵列和 LCP 阵列的空间之外。 我们的贡献改进了先前最有效的算法, $( NO) 时间, 在 [Nong, TOIS 2013] 和 $( Nlog N) 中用 额外单字, 在2007 里程中用 和 mainals 的 中用 和 美元构建前的 Orix 。