一组字符串近似最长期共同序列的数据结构 (A Data-Structure for Approximate Longest Common Subsequence of A Set of Strings) - 专知论文

会员服务 ·

0

正则表达式 · 近似 · 情景 · 极小点 · 正则化项 ·

2021 年 1 月 12 日

A Data-Structure for Approximate Longest Common Subsequence of A Set of Strings

翻译：一组字符串近似最长期共同序列的数据结构

Sepideh Aghamolaei

from arxiv, An optimal exact sketch for the LCS of two strings was already known: arXiv:1810.01238 as well as an approximation algorithm with weights: https://doi.org/10.1016/j.ic.2010.12.006 The edit distance of regular languages was also known: https://doi.org/10.3390/a11110165 Using these subroutines in any algorithm for the LCS of k strings gives a better result

Given a set of $k$ strings $I$, their longest common subsequence (LCS) is the string with the maximum length that is a subset of all the strings in $I$. A data-structure for this problem preprocesses $I$ into a data-structure such that the LCS of a set of query strings $Q$ with the strings of $I$ can be computed faster. Since the problem is NP-hard for arbitrary $k$, we allow an error that allows some characters to be replaced by other characters. We define the approximation version of the problem with an extra input $m$, which is the length of the regular expression (regex) that describes the input, and the approximation factor is the logarithm of the number of possibilities in the regex returned by the algorithm, divided by the logarithm regex with the minimum number of possibilities. Then, we use a tree data-structure to achieve sublinear-time LCS queries. We also explain how the idea can be extended to the longest increasing subsequence (LIS) problem.

翻译：根据一套美元字符串的一套美元字符串,他们最长的共同子序列(LCS)是最大长度的字符串,这是所有字符串的一个子集,以美元计。这个问题的数据结构预处理美元进入一个数据结构,这样可以更快地计算出一组查询字符串的LCS$Q美元,而字符串为美元。由于问题在于任意的美元,因此我们允许一个错误,允许一些字符被其他字符取代。我们用一个额外的输入美元来定义问题的近似版本,即描述输入的正则表达式(regex)的长度,而近似系数是算法返回的regex中的可能性的对数,除以对数正数正数正数正数和最小的可能性。然后,我们使用树数据结构来达到亚线性时间LCS查询。我们还解释了如何将这一想法扩展至最长的子序列问题。

0

相关内容

正则表达式

正则表达式

正则表达式（Regular Expression，一般简写为RegEx或者RegExp），也译为正规表示法、常规表示法，台湾译「规则运算式」，在计算机科学中，是指一个用来描述或者匹配一系列符合某个句法规则的字符串的单个字符串。

【知识图谱@EMNLP2020】Knowledge Graphs in NLP @ EMNLP 2020

【知识图谱@EMNLP2020】Knowledge Graphs in NLP @ EMNLP 2020

专知会员服务

43+阅读 · 2020年11月22日

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

81+阅读 · 2020年7月26日

【IJCAI2020】从语言图谱到常识图谱，TransOMCS: From Linguistic Graphs to Commonsense Knowledge

【IJCAI2020】从语言图谱到常识图谱，TransOMCS: From Linguistic Graphs to Commonsense Knowledge

专知会员服务

26+阅读 · 2020年5月6日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日

【哈工大】基于抽取的高考作文生成

【哈工大】基于抽取的高考作文生成

专知会员服务

37+阅读 · 2020年3月10日

【MIT】生成模型提出的分子的可合成性，48页pdf,The Synthesizability of Molecules Proposed by Generative Models

【MIT】生成模型提出的分子的可合成性，48页pdf,The Synthesizability of Molecules Proposed by Generative Models

专知会员服务

28+阅读 · 2020年2月20日

From Data to Model Programming: Injecting Structured Priors for Knowledge Extraction，南加州大学计算机科学系任翔助理教授，CIPS ATT 16（2019）

From Data to Model Programming: Injecting Structured Priors for Knowledge Extraction，南加州大学计算机科学系任翔助理教授，CIPS ATT 16（2019）

专知会员服务

14+阅读 · 2019年10月25日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

利用动态深度学习预测金融时间序列基于Python

利用动态深度学习预测金融时间序列基于Python

量化投资与机器学习

18+阅读 · 2018年10月30日

【NIPS2018】接收论文列表

【NIPS2018】接收论文列表

专知

5+阅读 · 2018年9月10日

条件GAN重大改进！cGANs with Projection Discriminator

条件GAN重大改进！cGANs with Projection Discriminator

CreateAMind

8+阅读 · 2018年2月7日

关关的刷题日记97 – Leetcode 105. Construct Binary Tree

关关的刷题日记97 – Leetcode 105. Construct Binary Tree

专知

3+阅读 · 2018年1月14日

【关关的刷题日记53】 Leetcode 100. Same Tree

【关关的刷题日记53】 Leetcode 100. Same Tree

专知

10+阅读 · 2017年12月1日

【LeetCode 202】关关的刷题日记35 – Leetcode 202. Happy Number

【LeetCode 202】关关的刷题日记35 – Leetcode 202. Happy Number

专知

5+阅读 · 2017年11月13日

Capsule Networks解析

Capsule Networks解析

机器学习研究会

11+阅读 · 2017年11月12日

【LeetCode 136】关关的刷题日记32 Single Number

【LeetCode 136】关关的刷题日记32 Single Number

专知

3+阅读 · 2017年11月10日

关关的刷题日记01—Leetcode 169. Majority Element

关关的刷题日记01—Leetcode 169. Majority Element

专知

3+阅读 · 2017年9月21日

Creating RESTful APIs over SPARQL endpoints using RAMOSE

Creating RESTful APIs over SPARQL endpoints using RAMOSE

Arxiv

0+阅读 · 2021年3月9日

An Algorithm for the Factorization of Split Quaternion Polynomials

Arxiv

0+阅读 · 2021年3月9日

A Bayesian construction of asymptotically unbiased estimators

Arxiv

0+阅读 · 2021年3月9日

On the Request-Trip-Vehicle Assignment Problem

Arxiv

0+阅读 · 2021年3月8日

The numerical factorization of polynomials

Arxiv

0+阅读 · 2021年3月8日

An approximation algorithm for approximation rank

Arxiv

0+阅读 · 2021年3月7日

The numerical greatest common divisor of univariate polynomials

Arxiv

0+阅读 · 2021年3月6日

2-nested matrices: towards understanding the structure of circle graphs

2-nested matrices: towards understanding the structure of circle graphs

Arxiv

0+阅读 · 2021年3月5日

Approximate bi-criteria search by efficient representation of subsets of the Pareto-optimal frontier

Arxiv

0+阅读 · 2021年3月5日

An Almost Optimal Edit Distance Oracle

Arxiv

0+阅读 · 2021年3月4日

VIP会员

文章信息

相关主题

正则表达式

相关VIP内容

【知识图谱@EMNLP2020】Knowledge Graphs in NLP @ EMNLP 2020

【知识图谱@EMNLP2020】Knowledge Graphs in NLP @ EMNLP 2020

专知会员服务

43+阅读 · 2020年11月22日

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

81+阅读 · 2020年7月26日

【IJCAI2020】从语言图谱到常识图谱，TransOMCS: From Linguistic Graphs to Commonsense Knowledge

【IJCAI2020】从语言图谱到常识图谱，TransOMCS: From Linguistic Graphs to Commonsense Knowledge

专知会员服务

26+阅读 · 2020年5月6日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日

【哈工大】基于抽取的高考作文生成

【哈工大】基于抽取的高考作文生成

专知会员服务

37+阅读 · 2020年3月10日

【MIT】生成模型提出的分子的可合成性，48页pdf,The Synthesizability of Molecules Proposed by Generative Models

【MIT】生成模型提出的分子的可合成性，48页pdf,The Synthesizability of Molecules Proposed by Generative Models

专知会员服务

28+阅读 · 2020年2月20日

From Data to Model Programming: Injecting Structured Priors for Knowledge Extraction，南加州大学计算机科学系任翔助理教授，CIPS ATT 16（2019）

From Data to Model Programming: Injecting Structured Priors for Knowledge Extraction，南加州大学计算机科学系任翔助理教授，CIPS ATT 16（2019）

专知会员服务

14+阅读 · 2019年10月25日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

《步兵小单元山地严寒作战指南》美军最新条令200页

《联合作战概念的发展》最新报告

俄制无人机弹药

《复杂场景下自主着陆的模型预测控制技术》92页

相关资讯

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

利用动态深度学习预测金融时间序列基于Python

利用动态深度学习预测金融时间序列基于Python

量化投资与机器学习

18+阅读 · 2018年10月30日

【NIPS2018】接收论文列表

【NIPS2018】接收论文列表

专知

5+阅读 · 2018年9月10日

条件GAN重大改进！cGANs with Projection Discriminator

条件GAN重大改进！cGANs with Projection Discriminator

CreateAMind

8+阅读 · 2018年2月7日

关关的刷题日记97 – Leetcode 105. Construct Binary Tree

关关的刷题日记97 – Leetcode 105. Construct Binary Tree

专知

3+阅读 · 2018年1月14日

【关关的刷题日记53】 Leetcode 100. Same Tree

【关关的刷题日记53】 Leetcode 100. Same Tree

专知

10+阅读 · 2017年12月1日

【LeetCode 202】关关的刷题日记35 – Leetcode 202. Happy Number

【LeetCode 202】关关的刷题日记35 – Leetcode 202. Happy Number

专知

5+阅读 · 2017年11月13日

Capsule Networks解析

Capsule Networks解析

机器学习研究会

11+阅读 · 2017年11月12日

【LeetCode 136】关关的刷题日记32 Single Number

【LeetCode 136】关关的刷题日记32 Single Number

专知

3+阅读 · 2017年11月10日

关关的刷题日记01—Leetcode 169. Majority Element

关关的刷题日记01—Leetcode 169. Majority Element

专知

3+阅读 · 2017年9月21日

相关论文

Creating RESTful APIs over SPARQL endpoints using RAMOSE

Creating RESTful APIs over SPARQL endpoints using RAMOSE

Arxiv

0+阅读 · 2021年3月9日

An Algorithm for the Factorization of Split Quaternion Polynomials

Arxiv

0+阅读 · 2021年3月9日

A Bayesian construction of asymptotically unbiased estimators

Arxiv

0+阅读 · 2021年3月9日

On the Request-Trip-Vehicle Assignment Problem

Arxiv

0+阅读 · 2021年3月8日

The numerical factorization of polynomials

Arxiv

0+阅读 · 2021年3月8日

An approximation algorithm for approximation rank

Arxiv

0+阅读 · 2021年3月7日

The numerical greatest common divisor of univariate polynomials

Arxiv

0+阅读 · 2021年3月6日

2-nested matrices: towards understanding the structure of circle graphs

2-nested matrices: towards understanding the structure of circle graphs

Arxiv

0+阅读 · 2021年3月5日

Approximate bi-criteria search by efficient representation of subsets of the Pareto-optimal frontier

Arxiv

0+阅读 · 2021年3月5日

An Almost Optimal Edit Distance Oracle

Arxiv

0+阅读 · 2021年3月4日

微信扫码咨询专知VIP会员