We study distributed algorithms for string matching problem in presence of wildcard characters. Given a string T (a text), we look for all occurrences of another string P (a pattern) as a substring of string T . Each wildcard character in the pattern matches a specific class of strings based on its type. String matching is one of the most fundamental problems in computer science, especially in the fields of bioinformatics and machine learning. Persistent effort has led to a variety of algorithms for the problem since 1960s. With rise of big data and the inevitable demand to solve problems on huge data sets, there have been many attempts to adapt classic algorithms into the MPC framework to obtain further efficiency. MPC is a recent framework for parallel computation of big data, which is designed to capture the MapReduce-like algorithms. In this paper, we study the string matching problem using a set of tools translated to MPC model. We consider three types of wildcards in string matching: - '?' wildcard: In this setting, the pattern is allowed to contain special '?' characters or don't cares that match any character of the text. String matching with don't cares could be solved by fast convolutions, and we give a constant round MPC algorithm for which by utilizing FFT in a constant number of MPC rounds. - '+' wildcard: '+' wildcard is a special character that allows for arbitrary repetitions of a character. When the pattern contains '+' wildcard characters, our algorithm runs in a constant number of MPC rounds by a reduction from subset matching problem. - '*' wildcard: '*' is a special character that matches with any substring of the text. When '*' is allowed in the pattern, we solve two special cases of the problem in logarithmic rounds.
翻译:在有通配符字符的情况下, 我们研究用于字符串匹配问题的分布算法。 在有通配符字符的 T 字符串( 文本) 下, 我们查找其他字符串 P ( 模式) 的所有发生情况, 作为字符串 T 的子字符串 。 模式中的每个通配符字符都匹配基于其类型的特殊字符类。 字符串匹配是计算机科学中最基本的问题之一, 特别是在生物信息学和机器学习领域。 自1960年代以来, 坚持不懈的努力导致了问题的各种算法。 随着大数据的上升和解决巨大数据集问题不可避免的需求, 我们多次尝试将经典算法转换成 MPC 框架, 以获得进一步的效率。 MPC 是用于平行计算大数据的一种框架。 在本文中, 字符串匹配问题由一组工具转换到 MPC 模型。 我们在字符串匹配三种通配配数时, 我们的通配制模式可以包含特殊模式? 字符或者不关心任何通配比任意运算法的任意性格式 。 将一个常态的通配卡与一个常态的马达卡, 。