Multi-agent reinforcement learning (MARL) is a powerful technology to construct interactive artificial intelligent systems in various applications such as multi-robot control and self-driving cars. Unlike supervised model or single-agent reinforcement learning, which actively exploits network pruning, it is obscure that how pruning will work in multi-agent reinforcement learning with its cooperative and interactive characteristics. \par In this paper, we present a real-time sparse training acceleration system named LearningGroup, which adopts network pruning on the training of MARL for the first time with an algorithm/architecture co-design approach. We create sparsity using a weight grouping algorithm and propose on-chip sparse data encoding loop (OSEL) that enables fast encoding with efficient implementation. Based on the OSEL's encoding format, LearningGroup performs efficient weight compression and computation workload allocation to multiple cores, where each core handles multiple sparse rows of the weight matrix simultaneously with vector processing units. As a result, LearningGroup system minimizes the cycle time and memory footprint for sparse data generation up to 5.72x and 6.81x. Its FPGA accelerator shows 257.40-3629.48 GFLOPS throughput and 7.10-100.12 GFLOPS/W energy efficiency for various conditions in MARL, which are 7.13x higher and 12.43x more energy efficient than Nvidia Titan RTX GPU, thanks to the fully on-chip training and highly optimized dataflow/data format provided by FPGA. Most importantly, the accelerator shows speedup up to 12.52x for processing sparse data over the dense case, which is the highest among state-of-the-art sparse training accelerators.
翻译:多试剂强化学习(MARL)是一种强大的技术,用于在多机器人控制和自驾驶汽车等各种应用中建立交互式人工智能系统。与监督模型或单剂强化学习不同,这种学习积极利用网络运行,但不清楚的是,在多试剂强化学习中,如何以合作和互动的特性进行裁剪。\par 在本文件中,我们展示了一个名为“学习组”的实时稀疏培训加速系统,首次采用算法/结构共同设计方法,对MARL的培训进行网络剪裁。我们使用重力组合算法或单剂强化学习。我们用重力组合算法或单剂强化学习,并提议在芯片稀释数据编码中快速编码,从而能够高效地运行网络网络。 学习组对多个核心进行高效的重力压缩和计算,每个核心处理器与矢量处理器处理器同时处理多行的重量矩阵。 因此,学习组系统将稀释数据生成的时间和记忆足于5.72x和6.10x格式。 其FPLA-FS-FA在12号G-FLA中展示了高速度的数据。