计算外部内存中一组字符串的 BWT 和 LCP 阵列 (Computing the BWT and LCP array of a Set of Strings in External Memory) - 专知论文

会员服务 ·

0

外部记忆 · 情景 · 泛化理论 · Next · 后向 ·

2020 年 12 月 4 日

Computing the BWT and LCP array of a Set of Strings in External Memory

翻译：计算外部内存中一组字符串的 BWT 和 LCP 阵列

Paola Bonizzoni,Gianluca Della Vedova,Yuri Pirola,Marco Previtali,Raffaella Rizzi

from arxiv, Theoretical Computer Science (2020). arXiv admin note: text overlap with arXiv:1607.08342

Indexing very large collections of strings, such as those produced by the widespread next generation sequencing technologies, heavily relies on multistring generalization of the Burrows-Wheeler Transform (BWT): large requirements of in-memory approaches have stimulated recent developments on external memory algorithms. The related problem of computing the Longest Common Prefix (LCP) array of a set of strings is instrumental to compute the suffix-prefix overlaps among strings, which is an essential step for many genome assembly algorithms. In a previous paper, we presented an in-memory divide-and-conquer method for building the BWT and LCP where we merge partial BWTs with a forward approach to sort suffixes. In this paper, we propose an alternative backward strategy to develop an external memory method to simultaneously build the BWT and the LCP array on a collection of m strings of different lengths. The algorithm over a set of strings having constant length k has O(mkl) time and I/O volume, using O(k + m) main memory, where l is the maximum value in the LCP array.

翻译：将非常庞大的字符串收集成索引,例如由广泛的下一代测序技术生成的字符串收集,严重依赖Burrows-Wheeler变换(BWT)的多弦化概括化:大量模拟方法的要求刺激了外部记忆算法的近期发展。计算一组字符串中最长的常见前缀(LCP)阵列的相关问题有助于计算字符串之间的后缀-前缀重叠,这是许多基因组组组组算法的一个基本步骤。在上一份论文中,我们展示了一种构建 BWT 和 LCP 的模擬分解法。我们用前方方法将部分 BWT 和后缀法合并了。在本文中,我们提出了另一种后向战略,以开发一种外部记忆方法,同时在一系列不同长度的 mrings 上建立 BWT 和 LCP 阵列。一组字符串的算法有O( mkl) 时间和 I/O 卷, 使用 O (k + m) 主内存, 其中I 是 LCP 阵列中的最大值。

0

相关内容

外部记忆

【RLChina2020公开课】Lecture-11.pdf【多智能体学习与游戏AI前沿】

【RLChina2020公开课】Lecture-11.pdf【多智能体学习与游戏AI前沿】

专知会员服务

27+阅读 · 2020年8月6日

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

80+阅读 · 2020年7月26日

【Google】具有秩-1因子的高效可扩展贝叶斯神经网络，Efficient and Scalable Bayesian Neural Nets with Rank-1 Factors

【Google】具有秩-1因子的高效可扩展贝叶斯神经网络，Efficient and Scalable Bayesian Neural Nets with Rank-1 Factors

专知会员服务

14+阅读 · 2020年5月19日

Python计算导论，560页pdf，Introduction to Computing Using Python

Python计算导论，560页pdf，Introduction to Computing Using Python

专知会员服务

75+阅读 · 2020年5月5日

【硬核书】数学博弈论与应用，431页pdf，Mathematical Game Theory and Applications

【硬核书】数学博弈论与应用，431页pdf，Mathematical Game Theory and Applications

专知会员服务

170+阅读 · 2020年4月18日

17篇知识图谱Knowledge Graphs论文 @AAAI2020

17篇知识图谱Knowledge Graphs论文 @AAAI2020

专知会员服务

172+阅读 · 2020年2月13日

【AAAI2020论文】关注实体以更好地理解文本（Attending to Entities for Better Text Understanding）

【AAAI2020论文】关注实体以更好地理解文本（Attending to Entities for Better Text Understanding）

专知会员服务

25+阅读 · 2019年11月15日

Understanding Color and the In-Camera Image Processing Pipeline for Computer Vision 【Michael S. Brown IEEE】韩国 ICCV 2019

Understanding Color and the In-Camera Image Processing Pipeline for Computer Vision 【Michael S. Brown IEEE】韩国 ICCV 2019

专知会员服务

10+阅读 · 2019年10月30日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

181+阅读 · 2019年10月11日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

利用动态深度学习预测金融时间序列基于Python

利用动态深度学习预测金融时间序列基于Python

量化投资与机器学习

18+阅读 · 2018年10月30日

【论文推荐】最新六篇主题模型相关论文—收敛率、大规模、深度主题建模、优化、情绪强度、广义动态主题模型

【论文推荐】最新六篇主题模型相关论文—收敛率、大规模、深度主题建模、优化、情绪强度、广义动态主题模型

专知

11+阅读 · 2018年3月29日

人工智能 | 国际会议截稿信息5条

人工智能 | 国际会议截稿信息5条

Call4Papers

6+阅读 · 2017年11月22日

关小刷刷题08 – Leetcode 26. Remove Duplicates from Sorted Array 方法2、3

关小刷刷题08 – Leetcode 26. Remove Duplicates from Sorted Array 方法2、3

专知

3+阅读 · 2017年9月29日

【推荐】RNN/LSTM时序预测

【推荐】RNN/LSTM时序预测

机器学习研究会

25+阅读 · 2017年9月8日

Auto-Encoding GAN

Auto-Encoding GAN

CreateAMind

7+阅读 · 2017年8月4日

强化学习 cartpole_a3c

强化学习 cartpole_a3c

CreateAMind

9+阅读 · 2017年7月21日

A statistical analysis of death rates in Italy for the years 2015-2020 and a comparison with the casualties reported for the COVID-19 pandemic

Arxiv

0+阅读 · 2021年2月8日

Frequency Regulation with Heterogeneous Energy Resources: A Realization using Distributed Control

Arxiv

0+阅读 · 2021年2月4日

HMC, an Algorithms in Data Mining, the Functional Analysis approach

Arxiv

0+阅读 · 2021年2月4日

Decoding of Space-Symmetric Rank Errors

Arxiv

0+阅读 · 2021年2月4日

Zeta Functions and the (Linear) Logic of Markov Processes

Arxiv

0+阅读 · 2021年2月4日

Zero Queueing for Multi-Server Jobs

Arxiv

0+阅读 · 2021年2月4日

The Parameterized Suffix Tray

Arxiv

0+阅读 · 2021年2月4日

CPR-GCN: Conditional Partial-Residual Graph Convolutional Network in Automated Anatomical Labeling of Coronary Arteries

CPR-GCN: Conditional Partial-Residual Graph Convolutional Network in Automated Anatomical Labeling of Coronary Arteries

Arxiv

8+阅读 · 2020年3月19日

Out of the Box: Reasoning with Graph Convolution Nets for Factual Visual Question Answering

Arxiv

5+阅读 · 2018年11月1日

High Performance Software in Multidimensional Reduction Methods for Image Processing with Application to Ancient Manuscripts

High Performance Software in Multidimensional Reduction Methods for Image Processing with Application to Ancient Manuscripts

Arxiv

4+阅读 · 2018年7月18日

VIP会员

文章信息

相关主题

相关VIP内容

【RLChina2020公开课】Lecture-11.pdf【多智能体学习与游戏AI前沿】

【RLChina2020公开课】Lecture-11.pdf【多智能体学习与游戏AI前沿】

专知会员服务

27+阅读 · 2020年8月6日

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

80+阅读 · 2020年7月26日

【Google】具有秩-1因子的高效可扩展贝叶斯神经网络，Efficient and Scalable Bayesian Neural Nets with Rank-1 Factors

【Google】具有秩-1因子的高效可扩展贝叶斯神经网络，Efficient and Scalable Bayesian Neural Nets with Rank-1 Factors

专知会员服务

14+阅读 · 2020年5月19日

Python计算导论，560页pdf，Introduction to Computing Using Python

Python计算导论，560页pdf，Introduction to Computing Using Python

专知会员服务

75+阅读 · 2020年5月5日

【硬核书】数学博弈论与应用，431页pdf，Mathematical Game Theory and Applications

【硬核书】数学博弈论与应用，431页pdf，Mathematical Game Theory and Applications

专知会员服务

170+阅读 · 2020年4月18日

17篇知识图谱Knowledge Graphs论文 @AAAI2020

17篇知识图谱Knowledge Graphs论文 @AAAI2020

专知会员服务

172+阅读 · 2020年2月13日

【AAAI2020论文】关注实体以更好地理解文本（Attending to Entities for Better Text Understanding）

【AAAI2020论文】关注实体以更好地理解文本（Attending to Entities for Better Text Understanding）

专知会员服务

25+阅读 · 2019年11月15日

Understanding Color and the In-Camera Image Processing Pipeline for Computer Vision 【Michael S. Brown IEEE】韩国 ICCV 2019

Understanding Color and the In-Camera Image Processing Pipeline for Computer Vision 【Michael S. Brown IEEE】韩国 ICCV 2019

专知会员服务

10+阅读 · 2019年10月30日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

181+阅读 · 2019年10月11日

热门VIP内容

开通专知VIP会员享更多权益服务

扩散模型中的 Transformer：图像生成及其延展应用询问 ChatGPT

281页pdf《神经网络设计入门》

【普林斯顿博士论文】以奖励推动生成式人工智能的发展：奖励引导生成的理论与方法

中文版 | 火力支援与巡飞弹药的未来（附原文）

相关资讯

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

利用动态深度学习预测金融时间序列基于Python

利用动态深度学习预测金融时间序列基于Python

量化投资与机器学习

18+阅读 · 2018年10月30日

【论文推荐】最新六篇主题模型相关论文—收敛率、大规模、深度主题建模、优化、情绪强度、广义动态主题模型

【论文推荐】最新六篇主题模型相关论文—收敛率、大规模、深度主题建模、优化、情绪强度、广义动态主题模型

专知

11+阅读 · 2018年3月29日

人工智能 | 国际会议截稿信息5条

人工智能 | 国际会议截稿信息5条

Call4Papers

6+阅读 · 2017年11月22日

关小刷刷题08 – Leetcode 26. Remove Duplicates from Sorted Array 方法2、3

关小刷刷题08 – Leetcode 26. Remove Duplicates from Sorted Array 方法2、3

专知

3+阅读 · 2017年9月29日

【推荐】RNN/LSTM时序预测

【推荐】RNN/LSTM时序预测

机器学习研究会

25+阅读 · 2017年9月8日

Auto-Encoding GAN

Auto-Encoding GAN

CreateAMind

7+阅读 · 2017年8月4日

强化学习 cartpole_a3c

强化学习 cartpole_a3c

CreateAMind

9+阅读 · 2017年7月21日

相关论文

A statistical analysis of death rates in Italy for the years 2015-2020 and a comparison with the casualties reported for the COVID-19 pandemic

Arxiv

0+阅读 · 2021年2月8日

Frequency Regulation with Heterogeneous Energy Resources: A Realization using Distributed Control

Arxiv

0+阅读 · 2021年2月4日

HMC, an Algorithms in Data Mining, the Functional Analysis approach

Arxiv

0+阅读 · 2021年2月4日

Decoding of Space-Symmetric Rank Errors

Arxiv

0+阅读 · 2021年2月4日

Zeta Functions and the (Linear) Logic of Markov Processes

Arxiv

0+阅读 · 2021年2月4日

Zero Queueing for Multi-Server Jobs

Arxiv

0+阅读 · 2021年2月4日

The Parameterized Suffix Tray

Arxiv

0+阅读 · 2021年2月4日

CPR-GCN: Conditional Partial-Residual Graph Convolutional Network in Automated Anatomical Labeling of Coronary Arteries

CPR-GCN: Conditional Partial-Residual Graph Convolutional Network in Automated Anatomical Labeling of Coronary Arteries

Arxiv

8+阅读 · 2020年3月19日

Out of the Box: Reasoning with Graph Convolution Nets for Factual Visual Question Answering

Arxiv

5+阅读 · 2018年11月1日

High Performance Software in Multidimensional Reduction Methods for Image Processing with Application to Ancient Manuscripts

High Performance Software in Multidimensional Reduction Methods for Image Processing with Application to Ancient Manuscripts

Arxiv

4+阅读 · 2018年7月18日

微信扫码咨询专知VIP会员