Experience replay is an essential component in deep reinforcement learning (DRL), which stores the experiences and generates experiences for the agent to learn in real time. Recently, prioritized experience replay (PER) has been proven to be powerful and widely deployed in DRL agents. However, implementing PER on traditional CPU or GPU architectures incurs significant latency overhead due to its frequent and irregular memory accesses. This paper proposes a hardware-software co-design approach to design an associative memory (AM) based PER, AMPER, with an AM-friendly priority sampling operation. AMPER replaces the widely-used time-costly tree-traversal-based priority sampling in PER while preserving the learning performance. Further, we design an in-memory computing hardware architecture based on AM to support AMPER by leveraging parallel in-memory search operations. AMPER shows comparable learning performance while achieving 55x to 270x latency improvement when running on the proposed hardware compared to the state-of-the-art PER running on GPU.
翻译:经验重现是深强化学习(DRL)的一个基本组成部分,它存储了经验,为代理商实时学习创造了经验。最近,优先经验重现(PER)被证明是强大的,在DRL代理商中广泛应用。然而,由于传统的CPU或GPU结构经常和不规则的内存存存存取,对传统的CPU或GPU结构实施PER产生大量的潜伏性间接费用。本文件建议采用硬件软件共同设计一个基于PER(AM)的联动内存(AM),AMPER(AMPER),并采用AMPER优先取样操作。AMPER取代了在PER中广泛使用的时间成本昂贵的树-三角优先取样,同时保留学习性能。此外,我们设计了一个基于AM的模拟计算机硬件结构,通过平行搜索操作来支持AMPER。AMPER在运行拟议硬件时,与在GPU运行的状态-PER时,在55x 270x latency上取得了可比的学习成绩,同时实现了55x 270x latency 改进。