RAPID-RL:为高效深层加强学习提供先先发制人访问的可重新配置的建筑 (RAPID-RL: A Reconfigurable Architecture with Preemptive-Exits for Efficient Deep-Reinforcement Learning)

Present-day Deep Reinforcement Learning (RL) systems show great promise towards building intelligent agents surpassing human-level performance. However, the computational complexity associated with the underlying deep neural networks (DNNs) leads to power-hungry implementations. This makes deep RL systems unsuitable for deployment on resource-constrained edge devices. To address this challenge, we propose a reconfigurable architecture with preemptive exits for efficient deep RL (RAPID-RL). RAPID-RL enables conditional activation of DNN layers based on the difficulty level of inputs. This allows to dynamically adjust the compute effort during inference while maintaining competitive performance. We achieve this by augmenting a deep Q-network (DQN) with side-branches capable of generating intermediate predictions along with an associated confidence score. We also propose a novel training methodology for learning the actions and branch confidence scores in a dynamic RL setting. Our experiments evaluate the proposed framework for Atari 2600 gaming tasks and a realistic Drone navigation task on an open-source drone simulator (PEDRA). We show that RAPID-RL incurs 0.34x (0.25x) number of operations (OPS) while maintaining performance above 0.88x (0.91x) on Atari (Drone navigation) tasks, compared to a baseline-DQN without any side-branches. The reduction in OPS leads to fast and efficient inference, proving to be highly beneficial for the resource-constrained edge where making quick decisions with minimal compute is essential.

翻译：深度强化学习(RL)系统显示了建设智能剂超过人类水平性能的巨大前景。然而,与深神经网络(DNNs)相关的计算复杂性导致实施强力饥饿。这使得深RL系统不适合在资源紧缺的边缘设备上部署。为了应对这一挑战,我们提议了一种可重新配置的结构,为高效深度RL(RAPID-RL)提供先发制人的出口出口。RAPID-RL允许根据投入的难度程度有条件启动DNN层。这样可以动态调整感知过程中的计算努力,同时保持竞争性性能。我们通过增强深Q网络(DQNNN),并配有侧边宽度能够产生中间预测和相关的信任得分。我们还提议了一种新的培训方法,用于学习高效深度RLL(RAPID-RL)的动作和分支信任得分。我们的实验评估了Atari 2600 拼装任务的拟议框架,以及用于开放源无人机模拟(PDRA)的最现实的德龙导航任务。我们展示的是,在GLA(ORIS 25) 的快速操作中,在OLS-LSAL上进行快速操作,在OLS-LA(OIL) AL的进度上,在不进行快速操作上进行快速操作。