Sorting is needed in many application domains. The data is read from memory and sent to a general purpose processor or application specific hardware for sorting. The sorted data is then written back to the memory. Reading/writing data from/to memory and transferring data between memory and processing unit incur a large latency and energy overhead. In this work, we develop, to the best of our knowledge, the first architectures for in-memory sorting of data. We propose two architectures. The first architecture is applicable to the conventional format of representing data, weighted binary radix. The second architecture is proposed for the developing unary processing systems where data is encoded as uniform unary bitstreams. The two architectures have different advantages and disadvantages, making one or the other more suitable for a specific application. However, the common property of both is a significant reduction in the processing time compared to prior sorting designs. Our evaluations show on average 37x and 138x energy reduction for binary and unary designs, respectively, as compared to conventional CMOS off-memory sorting systems.
翻译:需要在许多应用域中进行排序。 数据是从记忆中读取, 并发送到一般目的处理器或用于排序的特定硬件。 排序后的数据被写回存储。 将读写数据从存储到存储和在存储和处理单元之间传输数据, 产生一个很大的悬浮和能量管理费用。 在这项工作中, 我们根据我们的知识, 开发了第一个数据模拟排序的架构。 我们建议了两个架构。 第一个架构适用于代表数据的常规格式, 加权二进制射线。 第二个架构是用于开发将数据编码为统一的单行流的单行处理系统。 两种架构具有不同的优点和缺点, 使一个或另一个结构更适合特定的应用程序。 然而, 两者的共同特性是大大缩短处理时间, 与以前的排序设计相比。 我们的评价显示, 与常规 CMOS 离模排序系统相比, 平均为二进制和单行设计减少37x 和 138x 能量。