Long Short-Term Memory (LSTM) recurrent networks are frequently used for tasks involving time-sequential data such as speech recognition. Unlike previous LSTM accelerators that either exploit spatial weight sparsity or temporal activation sparsity, this paper proposes a new accelerator called "Spartus" that exploits spatio-temporal sparsity to achieve ultralow latency inference. Spatial sparsity is induced using a new Column-Balanced Targeted Dropout (CBTD) structured pruning method, which produces structured sparse weight matrices for balanced workloads. The pruned networks running on Spartus hardware achieve weight sparsity of up to 96% and 94% with negligible accuracy loss on the TIMIT and the Librispeech datasets. To induce temporal sparsity in LSTM, we extend the previous DeltaGRU method to the DeltaLSTM method. Combining spatio-temporal sparsity with CBTD and DeltaLSTM saves on weight memory access and associated arithmetic operations. The Spartus architecture is scalable and supports real-time online speech recognition when implemented on small and large FPGAs. Spartus per-sample latency for a single DeltaLSTM layer of 1024 neurons averages 1 us. Exploiting spatio-temporal sparsity leads to 46X speedup of Spartus over its theoretical hardware performance to achieve 9.4 TOp/s effective batch-1 throughput and 1.1 TOp/s/W power efficiency.
翻译:长期内存(LSTM) 常规网络经常用于涉及时间序列数据的任务,例如语音识别。 与以往的LSTM加速器不同,LSTM 加速器利用空间重量宽度或时间活度宽度,本文提议了一个新的加速器,称为“出入口”,利用时空空间宽度实现超低拉度。 空间宽度的导出使用新的“ 列- 列- 平衡放电( CBTD) 结构性下降( CBTD) 调试( CBTD) 结构化调试处理方法,该方法为平衡工作量生成结构化的稀薄重重量矩阵。 运行在Asearnings 硬件上产生的重量宽度为96%和94%,在TIMIT 和 Librispeech 数据集中,精度损失微小。 为了在LSTMTLTM 方法中引入时,我们将前的德尔GRURU方法扩大到了“46” 和“ DelsTMTLTM ” 的高级存储存储/相关计算操作。 当S- 运行时,S- realalalentalalal- sal- sal- supalalalal- supal 实现了“ 10- sal- salalal- salal- sal- salimalal- sal- salalalalimal- supal- supal 10- s- s a 10- sal- supalisalisal- supal- supal- salisalisalisal- a AS AS- sal- salizalisal- a- a 度 AS AS 度 AS AS AS AS AS AS 10- sal- sal- sal- sal- a AS- sal- sal real real realalal realalalal real realalalalalalal realalalalalalalal real real real real AS a AS- sal AS- sal AS- sal realalalalalalalalalalalalalalal AS- sal AS- a Falal AS- sal