A pattern $\alpha$ is a string of variables and terminal letters. We say that $\alpha$ matches a word $w$, consisting only of terminal letters, if $w$ can be obtained by replacing the variables of $\alpha$ by terminal words. The matching problem, i.e., deciding whether a given pattern matches a given word, was heavily investigated: it is NP-complete in general, but can be solved efficiently for classes of patterns with restricted structure. In this paper, we approach this problem in a generalized setting, by considering approximate pattern matching under Hamming distance. More precisely, we are interested in what is the minimum Hamming distance between $w$ and any word $u$ obtained by replacing the variables of $\alpha$ by terminal words. Firstly, we address the class of regular patterns (in which no variable occurs twice) and propose efficient algorithms for this problem, as well as matching conditional lower bounds. We show that the problem can still be solved efficiently if we allow repeated variables, but restrict the way the different variables can be interleaved according to a locality parameter. However, as soon as we allow a variable to occur more than once and its occurrences can be interleaved arbitrarily with those of other variables, even if none of them occurs more than once, the problem becomes intractable.
翻译:模式 $\ alpha$ 是一系列变量和终端字母。 我们说, $\ alpha$ 符合一个单字, 仅由终端字母组成, 只要能用终端字替换 $\ alpha$ 的变量, 就能用美元来获得。 匹配问题, 即决定给定模式是否与给定单词相匹配, 得到了大量调查: 它一般是NP- 完整的, 但对于结构受限制的模式类别, 可以有效解决 。 在本文中, 我们从一个普遍的角度来解决这个问题, 考虑在 Hamming 距离下匹配大致模式。 更确切地说, 我们感兴趣的是, 美元和 美元 美元 和 任何 美元 美元 之间最小的宽度距离, 以终端字替换 $\ ALpha$ 的变量 。 首先, 我们处理常规模式的类别( 没有变量两次出现), 并提出有效的算法, 以及有条件的下限 。 我们显示, 如果允许重复变量, 问题仍然可以有效解决 问题 。 但是限制 不同变量的间断 的方式 与地点 参数 的 参数 。 但是, 很快 就会 。