通过使用非多动记忆处理加速时间序列分析 (Accelerating Time Series Analysis via Processing using Non-Volatile Memories)

Time Series Analysis (TSA) is a critical workload for consumer-facing devices. Accelerating TSA is vital for many domains as it enables the extraction of valuable information and predict future events. The state-of-the-art algorithm in TSA is the subsequence Dynamic Time Warping (sDTW) algorithm. However, sDTW's computation complexity increases quadratically with the time series' length, resulting in two performance implications. First, the amount of data parallelism available is significantly higher than the small number of processing units enabled by commodity systems (e.g., CPUs). Second, sDTW is bottlenecked by memory because it 1) has low arithmetic intensity and 2) incurs a large memory footprint. To tackle these two challenges, we leverage Processing-using-Memory (PuM) by performing in-situ computation where data resides, using the memory cells. PuM provides a promising solution to alleviate data movement bottlenecks and exposes immense parallelism. In this work, we present MATSA, the first MRAM-based Accelerator for Time Series Analysis. The key idea is to exploit magneto-resistive memory crossbars to enable energy-efficient and fast time series computation in memory. MATSA provides the following key benefits: 1) it leverages high levels of parallelism in the memory substrate by exploiting column-wise arithmetic operations, and 2) it significantly reduces the data movement costs performing computation using the memory cells. We evaluate three versions of MATSA to match the requirements of different environments (e.g., embedded, desktop, or HPC computing) based on MRAM technology trends. We perform a design space exploration and demonstrate that our HPC version of MATSA can improve performance by 7.35x/6.15x/6.31x and energy efficiency by 11.29x/4.21x/2.65x over server CPU, GPU and PNM architectures, respectively.

翻译：时间序列分析( TSA) 是一个关键的工作量。加速 TSA 对许多领域至关重要, 因为它能够提取有价值的信息并预测未来的事件。 TSA 中最先进的算法是次序列动态时间转换算法。然而, SDTW 的计算复杂性随着时间序列长度的延长而增加二次曲线, 从而产生两种性能影响。首先, 可用的数据平行单元格数量大大高于商品系统( 例如, CPU) 所启用的少量处理器数量。其次, sDTW 被记忆卡住了, 因为它1) 算术强度低, 2 具有很大的存储足迹。为了应对这两个挑战, 我们利用 C- 处理- Memory( PuM ) 的计算方法, 利用记忆序列时间序列中的存储速度计算方法, 利用数据序列中11 。 PuMMMMM 提供了减轻数据流动瓶颈并暴露巨大平行现象的可行解决方案。在此过程中, 我们介绍MATSA, 用于时间序列分析的第一个基于 MIAM- Cereral- 的Malalalalal- dalal- ASalal ex 3 acalation ex ladeal deal deal deal deal lax lax lax lax lax lax lax lax lax lax 。