Accelerating finite automata processing is critical for advancing real-time analytic in pattern matching, data mining, bioinformatics, intrusion detection, and machine learning. Recent in-memory automata accelerators leveraging SRAMs and DRAMs have shown exciting improvements over conventional digital designs. However, the bit-vector representation of state transitions used by all SOTA designs is only optimal in processing worst-case completely random patterns, while a significant amount of memory and energy is wasted in running most real-world benchmarks. We present CAMA, a Content-Addressable Memory (CAM) enabled Automata accelerator for processing homogeneous non-deterministic finite automata (NFA). A radically different state representation scheme, along with co-designed novel circuits and data encoding schemes, greatly reduces energy, memory, and chip area for most realistic NFAs. CAMA is holistically optimized with the following major contributions: (1) a 16x256 8-transistor (8T) CAM array for state matching, replacing the 256x256 6T SRAM array or two 16x256 6T SRAM banks in SOTA designs; (2) a novel encoding scheme that enables content searching within 8T SRAMs and adapts to different applications; (3) a reconfigurable and scalable architecture that improves efficiency on all tested benchmarks, without losing support for any NFA that is compatible with SOTA designs; (4) an optimization framework that automates the choice of encoding schemes and maps a given NFA to the proposed hardware. Two versions of CAMA, one optimized for energy (CAMA-E) and the other for throughput (CAMA-T), are comprehensively evaluated in a 28nm CMOS process, and across 21 real-world and synthetic benchmarks. CAMA-E achieves 2.1x, 2.8x, and 2.04x lower energy than CA, 2-stride Impala, and eAP. CAMA-T shows 2.68x, 3.87x and 2.62x higher average compute density than 2-stride Impala, CA, and eAP.
翻译:加速自动数据处理对于在模式匹配、数据挖掘、生物信息学、入侵探测和机器学习方面推进实时分析至关重要。 最近利用 SRAM 和 DRAM 的模拟自动自动加速器(momory 自动加速器)比常规数字设计表现出令人兴奋的改进。 然而,所有SOTA 设计中所使用的州过渡比特维代方代表点在处理最坏情况完全随机模式方面只是最理想的。 而大量记忆和能量在运行大多数真实世界全面基准时被浪费了。 我们展示了一个内容可更新的存储器(CAM) 启用了 Autimata 自动读取系统(ADMA) 用于处理同一种非非非定期自动自动自动自动自动加速器的自动加速器(NFA) 。 与共同设计的新电路和数据编码(NFAMA- CAA) 相比, CAMA- CAMA- CIS- CAAL- CA- CIS- CIS- DIMA- dreal- disquistations a 和SMA- AS- AS- II- ASIMAL- AS- disal- AS- II- ASIMAL- 和 SMA- II- AS- AS- AS- AS- AS- AS- AS- AS- AS- AS- AS- disma- AS- AS- AS- AS- AS- AS- AS- AS- AS- disl AS ASl ASl AS AS AS AS AS 和 AS AS AS AS AS AS AS AS AS AS AS AS AS AS AS AS AS AS AS AS AS AS AS AS AS AS AS AS ASl 和 AS AS 和 t 和 t AS AS AS AS AS AS AS AS AS AS AS AS AS AS AS AS AS AS AS AS AS AS AS AS AS AL AS AS AS AS AS AS AS AS AS