The Burrows-Wheeler Transform (BWT) is often taught in undergraduate courses on algorithmic bioinformatics, because it underlies the FM-index and thus important tools such as Bowtie and BWA. Its admirers consider the BWT a thing of beauty but, despite thousands of pages being written about it over nearly thirty years, to undergraduates seeing it for the first time it still often seems like magic. Some who persevere are later shown the Positional BWT (PBWT), which was published twenty years after the BWT. In this paper we argue that the PBWT should be taught {\em before} the BWT. We first use the PBWT's close relation to a right-to-left radix sort to explain how to use it as a fast and space-efficient index for {\em positional search} on a set of strings (that is, given a pattern and a position, quickly list the strings containing that pattern starting in that position). We then observe that {\em prefix search} (listing all the strings that start with the pattern) is an easy special case of positional search, and that prefix search on the suffixes of a single string is equivalent to {\em substring search} in that string (listing all the starting positions of occurrences of the pattern in the string). Storing na\"ively a PBWT of the suffixes of a string is space-{\em inefficient} but, in even reasonably small examples, most of its columns are nearly the same. It is not difficult to show that if we store a PBWT of the cyclic shifts of the string, instead of its suffixes, then all the columns are exactly the same -- and equal to the BWT of the string. Thus we can teach the BWT and the FM-index via the PBWT.
翻译:Burrows- Wheeler 变换 (BWT) 通常在本科生课程中教授有关算法生物信息学的低效率课程, 因为它是调频指数的基础, 因而也是重要的工具, 如 Bowtie 和 BWA 。 其崇拜者认为 BWT 是一件美事, 尽管在近30年中写了上数千页, 但对于本科生来说, 它还是魔术。 一些坚持者后来在BWT 发表后20年才被显示为 COPal BWT (PBWT ) 。 在本文中, 我们争辩说, PBWT 应该在 BWT 之前被教授。 我们首先应该把 PBWT 和 BWT 的近亲近关系 来解释如何用它来快速和空间高效的索引 。 在BWT 中, 我们从 B 开始搜索的所有字符串中, 直到 直线 直线 。