We present the first algorithm for regular expression matching that can take advantage of sparsity in the input instance. Our main result is a new algorithm that solves regular expression matching in $O\left(\Delta \log \log \frac{nm}{\Delta} + n + m\right)$ time, where $m$ is the number of positions in the regular expression, $n$ is the length of the string, and $\Delta$ is the \emph{density} of the instance, defined as the total number of states in a simulation of the position automaton. This measure is a lower bound on the total number of states in simulations of all classic polynomial sized finite automata. Our bound improves the best known bounds for regular expression matching by almost a linear factor in the density of the problem. The key component in the result is a novel linear space representation of the position automaton that supports state-set transition computation in near-linear time in the size of the input and output state sets.
翻译:我们为正则表达式匹配提供了第一个算法, 可以利用输入实例中的宽度。 我们的主要结果是一个新的算法, 解决正则表达式匹配值在$O\left (\ Delta\log\log\ frac{ nm\Delta} + n + m\right) 美元时间中, 美元是正则表达式中的位置数, 美元是字符串的长度, 美元是实例中的 $\ Delta$, 定义为位置自动图的模拟中状态的总数 。 这个计量对于所有经典多面体大小的自动图的模拟中状态的总数来说, 限制较小 。 我们的边框将常规表达式的已知最佳边框增加为问题密度中几乎一个线性系数的匹配值 。 结果的关键部分是支持在输入和输出状态组大小的近线性时间中进行状态设定转换的状态转换的方位的新的线性空间表示 。