DQN (Deep Q-Network) is a method to perform Q-learning for reinforcement learning using deep neural networks. DQNs require a large buffer and batch processing for an experience replay and rely on a backpropagation based iterative optimization, making them difficult to be implemented on resource-limited edge devices. In this paper, we propose a lightweight on-device reinforcement learning approach for low-cost FPGA devices. It exploits a recently proposed neural-network based on-device learning approach that does not rely on the backpropagation method but uses OS-ELM (Online Sequential Extreme Learning Machine) based training algorithm. In addition, we propose a combination of L2 regularization and spectral normalization for the on-device reinforcement learning so that output values of the neural network can be fit into a certain range and the reinforcement learning becomes stable. The proposed reinforcement learning approach is designed for PYNQ-Z1 board as a low-cost FPGA platform. The evaluation results using OpenAI Gym demonstrate that the proposed algorithm and its FPGA implementation complete a CartPole-v0 task 29.77x and 89.40x faster than a conventional DQN-based approach when the number of hidden-layer nodes is 64.
翻译:DQN (Deep Q-Network) 是一种利用深神经网络进行加强学习的Q学习的方法。 DQN 需要一个大型缓冲和批量处理,用于经验重放,并依赖基于迭代优化的后再造法,使其难以在资源有限的边缘设备上实施。在本文中,我们建议对低成本的 FPGA 设备采用轻量级的点数强化强化强化学习方法。它利用了最近提出的基于构件学习方法的神经网络,该方法不依赖后回路回路方法,而使用基于OS-ELM(在线超高级学习机)的培训算法。此外,我们提议将L2正规化和光谱化相结合,用于基于基于源码的强化学习,以便神经网络的输出值能够适应一定范围,强化学习变得稳定。 拟议的强化学习方法是为PYNQ-Z1 董事会设计的,作为一个低成本FPGA平台。 OpenAI Gym的评价结果表明,拟议的算法及其FPGA-40-Slax-n-Slaxxxxxxx 任务任务完成一个快速的Cartal-x-x-xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx