实用 KMP/BM 风格风格匹配未定字符串模式 (Practical KMP/BM Style Pattern-Matching on Indeterminate Strings) - 专知论文

会员服务 ·

0

CASES · Alphabet · 相似度 · 变换 · 正则化项 ·

2022 年 4 月 18 日

Practical KMP/BM Style Pattern-Matching on Indeterminate Strings

翻译：实用 KMP/BM 风格风格匹配未定字符串模式

Hossein Dehghani,Neerja Mhaskar,W. F. Smyth

In this paper we describe two simple, fast, space-efficient algorithms for finding all matches of an indeterminate pattern $\s{p} = \s{p}[1..m]$ in an indeterminate string $\s{x} = \s{x}[1..n]$, where both \s{p} and \s{x} are defined on a "small" ordered alphabet $\Sigma$ -- say, $\sigma = |\Sigma| \le 9$. Both algorithms depend on a preprocessing phase that replaces $\Sigma$ by an integer alphabet $\Sigma_I$ of size $\sigma_I = \sigma$ which (reversibly, in time linear in string length) maps both \s{x} and \s{p} into equivalent regular strings \s{y} and \s{q}, respectively, on $\Sigma_I$, whose maximum (indeterminate) letter can be expressed in a 32-bit word (for $\sigma \le 4$, thus for DNA sequences, an 8-bit representation suffices). We first describe an efficient version \textsc{KMP\_Indet} of the venerable Knuth-Morris-Pratt algorithm to find all occurrences of \s{q} in \s{y} (that is, of \s{p} in \s{x}), but, whenever necessary, using the prefix array, rather than the border array, to control shifts of the transformed pattern \s{q} along the transformed string \s{y}. %Although requiring $\O(m^2n)$ time in the theoretical worst case, in cases of practical interest \textsc{KMP\_Indet} executes in $\O(n)$ time. We go on to describe a similar efficient version \textsc{BM\_Indet} of the Boyer-Moore algorithm that turns out to execute significantly faster than \textsc{KMP\_Indet} over a wide range of test cases. %A noteworthy feature is that both algorithms require very little additional space: $\Theta(m)$ words. We conjecture that a similar approach may yield practical and efficient indeterminate equivalents to other well-known pattern-matching algorithms, in particular the several variants of Boyer-Moore.

翻译：在本文中,我们描述两个简单、快速、空间效率的算法, 以找到所有不确定模式的匹配值 $\ s{p} =\ s{p} [1. 0.] 美元, 在一个不确定的字符串 $\ c{x} =\ s{x} [1..n] 美元, 在“ 小” 命令字母 $\ sgma$ - 比如, $\ sgma = \ sigma\\\\ le 9$。两种算法都取决于一个预处理阶段, 以整数字母 $\ sgma$ 取代美元 {s{p} [1. 美元] [1. 美元] 美元, 在一个不固定的字符串中, 将一个额外的 {s\ qx} 和\ sr\ pr\ p*, 以等量的正常的字符串。

0

相关内容

CASES

CASES：International Conference on Compilers, Architectures, and Synthesis for Embedded Systems。 Explanation：嵌入式系统编译器、体系结构和综合国际会议。 Publisher：ACM。 SIT： http://dblp.uni-trier.de/db/conf/cases/index.html

【经典书】线性代数，436页pdf

专知会员服务

78+阅读 · 2021年3月16日

【2020新书】Python专业实践，250页pdf，Practices of the Python Pro

【2020新书】Python专业实践，250页pdf，Practices of the Python Pro

专知会员服务

60+阅读 · 2020年11月15日

【2020新书】实战测试自动化，Practical Test Automation，327页pdf

【2020新书】实战测试自动化，Practical Test Automation，327页pdf

专知会员服务

34+阅读 · 2020年8月26日

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

【新书】数字图像(影像)处理手第二版，2176pdf，Mathematical Methods in Imaging

【新书】数字图像(影像)处理手第二版，2176pdf，Mathematical Methods in Imaging

专知会员服务

93+阅读 · 2020年2月12日

UC.Berkeley CS189讲义教材:《机器学习全面指南》，185页pdf

专知会员服务

162+阅读 · 2020年1月16日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

【新书】Python编程基础，669页pdf

【新书】Python编程基础，669页pdf

专知会员服务

197+阅读 · 2019年10月10日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

IEEE TII Call For Papers

IEEE TII Call For Papers

CCF多媒体专委会

3+阅读 · 2022年3月24日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

Call for Nominations: 2022 Multimedia Prize Paper Award

Call for Nominations: 2022 Multimedia Prize Paper Award

CCF多媒体专委会

0+阅读 · 2022年2月12日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

中国图象图形学学会CSIG

0+阅读 · 2021年11月16日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium1

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium1

中国图象图形学学会CSIG

0+阅读 · 2021年11月3日

会议交流 | IJCKG: International Joint Conference on Knowledge Graphs

会议交流 | IJCKG: International Joint Conference on Knowledge Graphs

开放知识图谱

0+阅读 · 2021年9月9日

【ICIG2021】Latest News & Announcements of the Industry Talk1

【ICIG2021】Latest News & Announcements of the Industry Talk1

中国图象图形学学会CSIG

0+阅读 · 2021年7月28日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Capsule Networks解析

Capsule Networks解析

机器学习研究会

11+阅读 · 2017年11月12日

转录因子TWIST1参与骨髓增生异常综合征（MDS）中DNA甲基化作用机制的研究

国家自然科学基金

0+阅读 · 2014年12月31日

SF3B1基因调节Bcl-x可变剪接参与骨髓增生异常综合征-RARS红系无效造血的研究

国家自然科学基金

0+阅读 · 2013年12月31日

基于要素逐步添加GERT网络的非常规突发事件"情景-应对"策略作用机理研究

国家自然科学基金

0+阅读 · 2013年12月31日

Zfp36l2在胚胎体轴形成中的作用机理研究

国家自然科学基金

0+阅读 · 2012年12月31日

RGM与neogenin信号调控应激性精神障碍-PTSD杏仁核、海马神经细胞凋亡的分子机制

国家自然科学基金

0+阅读 · 2011年12月31日

信号放大外周血循环肿瘤细胞电化学检测新方法研究

国家自然科学基金

0+阅读 · 2011年12月31日

肠干细胞候选标志物 β1-integrin调控Hedgehog信号通路在结肠癌发生中作用及机制的研究

国家自然科学基金

0+阅读 · 2011年12月31日

核酸蛋白探针双识别生物传感检测DNA甲基化的研究

国家自然科学基金

0+阅读 · 2009年12月31日

FGFR1在NFATc1通路异常致胚胎心内膜细胞凋亡增殖紊乱中的作用及机制

国家自然科学基金

0+阅读 · 2009年12月31日

TR3相互作用新蛋白机理研究

国家自然科学基金

1+阅读 · 2008年12月31日

On Minimally Non-Firm Binary Matrices

Arxiv

0+阅读 · 2022年6月8日

Response-adaptive randomization in clinical trials: from myths to practical considerations

Arxiv

0+阅读 · 2022年6月7日

On entropic and almost multilinear representability of matroids

Arxiv

0+阅读 · 2022年6月7日

On Outer Bi-Lipschitz Extensions of Linear Johnson-Lindenstrauss Embeddings of Low-Dimensional Submanifolds of $\mathbb{R}^N$

Arxiv

0+阅读 · 2022年6月7日

Sparse Regular Expression Matching

Arxiv

0+阅读 · 2022年6月7日

Data-driven Construction of Hierarchical Matrices with Nested Bases

Arxiv

0+阅读 · 2022年6月4日

Numerical Methods to Compute the Coriolis Matrix and Christoffel Symbols for Rigid-Body Systems

Arxiv

0+阅读 · 2022年6月3日

Root of Unity for Secure Distributed Matrix Multiplication: Grid Partition Case

Arxiv

0+阅读 · 2022年6月3日

Erdős Matching Conjecture for almost perfect matchings

Arxiv

0+阅读 · 2022年6月3日

Max-Weight Online Stochastic Matching: Improved Approximations Against the Online Benchmark

Arxiv

0+阅读 · 2022年6月2日

VIP会员

文章信息

相关主题

相关VIP内容

【经典书】线性代数，436页pdf

专知会员服务

78+阅读 · 2021年3月16日

【2020新书】Python专业实践，250页pdf，Practices of the Python Pro

【2020新书】Python专业实践，250页pdf，Practices of the Python Pro

专知会员服务

60+阅读 · 2020年11月15日

【2020新书】实战测试自动化，Practical Test Automation，327页pdf

【2020新书】实战测试自动化，Practical Test Automation，327页pdf

专知会员服务

34+阅读 · 2020年8月26日

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

【新书】数字图像(影像)处理手第二版，2176pdf，Mathematical Methods in Imaging

【新书】数字图像(影像)处理手第二版，2176pdf，Mathematical Methods in Imaging

专知会员服务

93+阅读 · 2020年2月12日

UC.Berkeley CS189讲义教材:《机器学习全面指南》，185页pdf

专知会员服务

162+阅读 · 2020年1月16日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

【新书】Python编程基础，669页pdf

【新书】Python编程基础，669页pdf

专知会员服务

197+阅读 · 2019年10月10日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

《代码、指挥与冲突：描绘军事人工智能的未来》报告

【斯坦福博士论文】面向地理空间数据的多模态与多尺度建模：时空生成式人工智能

美国启动“自有军事人工智能计划”：采用谷歌Gemini以推动全军人工智能应用

《创新与适应性作为军事成功的关键因素：来自俄乌战争的战略洞见》报告

相关资讯

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

IEEE TII Call For Papers

IEEE TII Call For Papers

CCF多媒体专委会

3+阅读 · 2022年3月24日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

Call for Nominations: 2022 Multimedia Prize Paper Award

Call for Nominations: 2022 Multimedia Prize Paper Award

CCF多媒体专委会

0+阅读 · 2022年2月12日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

中国图象图形学学会CSIG

0+阅读 · 2021年11月16日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium1

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium1

中国图象图形学学会CSIG

0+阅读 · 2021年11月3日

会议交流 | IJCKG: International Joint Conference on Knowledge Graphs

会议交流 | IJCKG: International Joint Conference on Knowledge Graphs

开放知识图谱

0+阅读 · 2021年9月9日

【ICIG2021】Latest News & Announcements of the Industry Talk1

【ICIG2021】Latest News & Announcements of the Industry Talk1

中国图象图形学学会CSIG

0+阅读 · 2021年7月28日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Capsule Networks解析

Capsule Networks解析

机器学习研究会

11+阅读 · 2017年11月12日

相关论文

On Minimally Non-Firm Binary Matrices

Arxiv

0+阅读 · 2022年6月8日

Response-adaptive randomization in clinical trials: from myths to practical considerations

Arxiv

0+阅读 · 2022年6月7日

On entropic and almost multilinear representability of matroids

Arxiv

0+阅读 · 2022年6月7日

On Outer Bi-Lipschitz Extensions of Linear Johnson-Lindenstrauss Embeddings of Low-Dimensional Submanifolds of $\mathbb{R}^N$

Arxiv

0+阅读 · 2022年6月7日

Sparse Regular Expression Matching

Arxiv

0+阅读 · 2022年6月7日

Data-driven Construction of Hierarchical Matrices with Nested Bases

Arxiv

0+阅读 · 2022年6月4日

Numerical Methods to Compute the Coriolis Matrix and Christoffel Symbols for Rigid-Body Systems

Arxiv

0+阅读 · 2022年6月3日

Root of Unity for Secure Distributed Matrix Multiplication: Grid Partition Case

Arxiv

0+阅读 · 2022年6月3日

Erdős Matching Conjecture for almost perfect matchings

Arxiv

0+阅读 · 2022年6月3日

Max-Weight Online Stochastic Matching: Improved Approximations Against the Online Benchmark

Arxiv

0+阅读 · 2022年6月2日

相关基金

转录因子TWIST1参与骨髓增生异常综合征（MDS）中DNA甲基化作用机制的研究

国家自然科学基金

0+阅读 · 2014年12月31日

SF3B1基因调节Bcl-x可变剪接参与骨髓增生异常综合征-RARS红系无效造血的研究

国家自然科学基金

0+阅读 · 2013年12月31日

基于要素逐步添加GERT网络的非常规突发事件"情景-应对"策略作用机理研究

国家自然科学基金

0+阅读 · 2013年12月31日

Zfp36l2在胚胎体轴形成中的作用机理研究

国家自然科学基金

0+阅读 · 2012年12月31日

RGM与neogenin信号调控应激性精神障碍-PTSD杏仁核、海马神经细胞凋亡的分子机制

国家自然科学基金

0+阅读 · 2011年12月31日

信号放大外周血循环肿瘤细胞电化学检测新方法研究

国家自然科学基金

0+阅读 · 2011年12月31日

肠干细胞候选标志物 β1-integrin调控Hedgehog信号通路在结肠癌发生中作用及机制的研究

国家自然科学基金

0+阅读 · 2011年12月31日

核酸蛋白探针双识别生物传感检测DNA甲基化的研究

国家自然科学基金

0+阅读 · 2009年12月31日

FGFR1在NFATc1通路异常致胚胎心内膜细胞凋亡增殖紊乱中的作用及机制

国家自然科学基金

0+阅读 · 2009年12月31日

TR3相互作用新蛋白机理研究

国家自然科学基金

1+阅读 · 2008年12月31日

微信扫码咨询专知VIP会员