The ever-increasing computation complexity of fast-growing Deep Neural Networks (DNNs) has requested new computing paradigms to overcome the memory wall in conventional Von Neumann computing architectures. The emerging Computing-In-Memory (CIM) architecture has been a promising candidate to accelerate neural network computing. However, the data movement between CIM arrays may still dominate the total power consumption in conventional designs. This paper proposes a flexible CIM processor architecture named Domino to enable stream computing and local data access to significantly reduce the data movement energy. Meanwhile, Domino employs tailored distributed instruction scheduling within Network-on-Chip (NoC) to implement inter-memory-computing and attain mapping flexibility. The evaluation with prevailing CNN models shows that Domino achieves 1.15-to-9.49$\times$ power efficiency over several state-of-the-art CIM accelerators and improves the throughput by 1.57-to-12.96$\times$.
翻译:快速增长的深神经网络(DNNS)日益复杂的计算复杂性要求新的计算模式,以克服常规的Von Neumann计算结构中的记忆墙。新兴的计算机内模拟(CIM)架构是加速神经网络计算的一个大有希望的候选方案。然而,CIM阵列之间的数据流动仍然可能主导常规设计的总电能消耗。本文建议建立一个名为Domino的灵活的CIM处理器结构,以允许流计算和本地数据访问,从而大大减少数据流动能量。与此同时,Domino在网络-Chip(NOC)内采用量身定制的分布式指示列表,以实施模拟计算并实现绘图灵活性。对流行的CNN模型的评估显示,Domino在数个先进的CIM加速器中实现了1.15至9.49美元的时间效率,并将吞吐量提高了1.57至12.96美元的时间。