Linear-time pattern matching engines have seen promising results using Finite Automata (FA) as their computation model. Among different FA variants, deterministic (DFA) and non-deterministic (NFA) are the most commonly used computation models for FA-based pattern matching engines. Moreover, NFA is the prevalent model in pattern matching engines on spatial architectures. The reasons are: i) DFA size, as in #states, can be exponential compared to equivalent NFA, ii) DFA cannot exploit the massive parallelism available on spatial architectures. This paper performs an empirical study on the #state of minimized DFA and optimized NFA across a diverse set of real-world benchmarks and shows that if distinct DFAs are generated for distinct patterns, #states of minimized DFA are typically equal to their equivalent optimized NFA. However, NFA is more robust in maintaining the low #states for some benchmarks. Thus, the choice of NFA vs. DFA for spatial architecture is less important than the need to generate distinct DFAs for each pattern and support these distinct DFAs' parallel processing. Finally, this paper presents a throughput study for von Neumann's architecture-based (CPU) vs. spatial architecture-based (FPGA) automata processing engines. The study shows that, based on the workload, neither CPU-based automata processing engine nor FPGA-based automata processing engine is the clear winner. If #patterns matched per workload increases, the CPU-based automata processing engine's throughput decreases. On the other hand, the FPGA-based automata processing engine lacks the memory spilling option; hence, it fails to accommodate an entire automata if it does not fit into FPGA's logic fabric. In the best-case scenario, the CPU has a 4.5x speedup over the FPGA, while for some benchmarks, the FPGA has a 32,530x speedup over the CPU.
翻译:使用 Finite Automata (FA) 进行线上模式匹配的引擎,其结果很有希望。 在不同的 FA 变量中, 确定性(DFA) 和非确定性(NFA) 是FA 模式匹配引擎最常用的计算模型。 此外, NFA 是空间架构中模式匹配引擎的常用模型。 原因是 i) DFA 与 # States 相比, DFA 的大小可以指数化, 与 NFA 相当, ii) DFA 无法利用空间架构中现有的大规模平行关系。 本文对“ 非确定性(DFA) 状态进行实证性研究, 在一个不同的实体世界基准中, 优化的自动确定性(DFA ) 优化的 NFA 模式, 显示不同的 DFA 格式处理模式。 最后, # DFA 通常等同于最佳的 NFA。 但是, NFA 以低 状态维持某些基准的“ ” 状态。 因此, 以 NFA 基础的 DFA 为空间结构选择, 并不重要, 而不是基于基于空间结构, 比需要产生明确的 DFA,, 以每个模式产生不同的 DFA, 并支持 并支持这些格式处理。