Bulk bitwise operations, i.e., bitwise operations on large bit vectors, are prevalent in a wide range of important application domains, including databases, graph processing, genome analysis, cryptography, and hyper-dimensional computing. In conventional systems, the performance and energy efficiency of bulk bitwise operations are bottlenecked by data movement between the compute units and the memory hierarchy. In-flash processing (i.e., processing data inside NAND flash chips) has a high potential to accelerate bulk bitwise operations by fundamentally reducing data movement through the entire memory hierarchy. We identify two key limitations of the state-of-the-art in-flash processing technique for bulk bitwise operations; (i) it falls short of maximally exploiting the bit-level parallelism of bulk bitwise operations; (ii) it is unreliable because it does not consider the highly error-prone nature of NAND flash memory. We propose Flash-Cosmos (Flash Computation with One-Shot Multi-Operand Sensing), a new in-flash processing technique that significantly increases the performance and energy efficiency of bulk bitwise operations while providing high reliability. Flash-Cosmos introduces two key mechanisms that can be easily supported in modern NAND flash chips: (i) Multi-Wordline Sensing (MWS), which enables bulk bitwise operations on a large number of operands with a single sensing operation, and (ii) Enhanced SLC-mode Programming (ESP), which enables reliable computation inside NAND flash memory. We demonstrate the feasibility of performing bulk bitwise operations with high reliability in Flash-Cosmos by testing 160 real 3D NAND flash chips. Our evaluation shows that Flash-Cosmos improves average performance and energy efficiency by 3.5x/32x and 3.3x/95x, respectively, over the state-of-the-art in-flash/outside-storage processing techniques across three real-world applications.
翻译:散装散装散装操作,即对大位矢量的微小操作,在很多重要的应用领域,包括数据库、图表处理、基因组分析、加密和高维计算,都普遍存在于一系列重要的应用领域,包括数据库、图表处理、基因组分析、加密和高维计算。在常规系统中,散装散装散装作业的性能和能效被计算单位和记忆级之间的数据移动所瓶颈。在冲压处理(即纳氏闪电芯片内处理数据)极有可能通过从根本上减少整个记忆级结构的数据移动来加快散装散装操作。我们发现,在大规模操作中,最先进的速处理技术(通过SHOD-多角度的多功能和感光学系统)中,我们发现两个关键的快速处理技术,大大提高了散装直径精确处理技术的性能和能量效率,同时提供了高度的比重操作;(二) 快速流-直流-直流-直流-直流-直流-直流-直流-直径-直径-直径-直径-直径-直径-直径-直径-直径-直径-直径-直径-直径-直径-直径-直径-直径-直径-直径-直径-直径-直径-直-直-直-直-直-直-直径-直径-直-直-直-直-直-直-直径-直径-直-直-直-直-直-直-直-直径-直径-直-直-直径-直径-直径-直径-直径-直径-直径-直径-直-直-直-直-直-直-直-直-直-直-直-直-直-直-直-直-直-直-直-直-直-直-直-直-直-直-直-直-直-直-直-直-直-直-直-直-直-直-直-直-直-直-直-直-直-直-直-直-直-直-直-直-直-直-直-直-直-直-直-直-直-直-直-直-直-直-直-直-直-直