Memory-augmented neural networks (MANNs) provide better inference performance in many tasks with the help of an external memory. The recently developed differentiable neural computer (DNC) is a MANN that has been shown to outperform in representing complicated data structures and learning long-term dependencies. DNC's higher performance is derived from new history-based attention mechanisms in addition to the previously used content-based attention mechanisms. History-based mechanisms require a variety of new compute primitives and state memories, which are not supported by existing neural network (NN) or MANN accelerators. We present HiMA, a tiled, history-based memory access engine with distributed memories in tiles. HiMA incorporates a multi-mode network-on-chip (NoC) to reduce the communication latency and improve scalability. An optimal submatrix-wise memory partition strategy is applied to reduce the amount of NoC traffic; and a two-stage usage sort method leverages distributed tiles to improve computation speed. To make HiMA fundamentally scalable, we create a distributed version of DNC called DNC-D to allow almost all memory operations to be applied to local memories with trainable weighted summation to produce the global memory output. Two approximation techniques, usage skimming and softmax approximation, are proposed to further enhance hardware efficiency. HiMA prototypes are created in RTL and synthesized in a 40nm technology. By simulations, HiMA running DNC and DNC-D demonstrates 6.47x and 39.1x higher speed, 22.8x and 164.3x better area efficiency, and 6.1x and 61.2x better energy efficiency over the state-of-the-art MANN accelerator. Compared to an Nvidia 3080Ti GPU, HiMA demonstrates speedup by up to 437x and 2,646x when running DNC and DNC-D, respectively.
翻译:内存放大神经网络( MANND) 提供外部记忆帮助下, 在许多任务中提供更好的推断性能。 最近开发的不同神经计算机(DNNC) 是一个MAN, 显示在代表复杂的数据结构和学习长期依赖性方面表现优于表现。 DNC 的更高性能来自基于历史的新关注机制以及先前使用的基于内容的注意机制。 基于历史的机制需要各种新的计算性软数据原始和状态记忆, 而现有的神经网络(NNN) 或 MAN 高级加速器则不支持这些原始和状态记忆。 我们展示了HIMA, 一个基于历史的39个基于历史的内存访问引擎, 以在提盘中分布的记忆结构。 HIMA包含一个多式的网络在芯片上(NC), 以降低通信的延迟性能, 并改进基于子节点的存储节能节能间隔战略, 使用两个阶段的平流方法可以提高计算速度。 使HMA 基本可升级, 运行的内存中, 将一个可发送式的内装的内存的DNCMD- mal- dIMD 生成的机, 向二号向二号 演示中, 向二进制成一个可发送的智能同步输出的内压的输出输出。 DNC- dNC- dNC- dNC- dNC- dNC- dx 。