With the Deep Neural Networks (DNNs) as a powerful function approximator, Deep Reinforcement Learning (DRL) has been excellently demonstrated on robotic control tasks. Compared to DNNs with vanilla artificial neurons, the biologically plausible Spiking Neural Network (SNN) contains a diverse population of spiking neurons, making it naturally powerful on state representation with spatial and temporal information. Based on a hybrid learning framework, where a spike actor-network infers actions from states and a deep critic network evaluates the actor, we propose a Population-coding and Dynamic-neurons improved Spiking Actor Network (PDSAN) for efficient state representation from two different scales: input coding and neuronal coding. For input coding, we apply population coding with dynamically receptive fields to directly encode each input state component. For neuronal coding, we propose different types of dynamic-neurons (containing 1st-order and 2nd-order neuronal dynamics) to describe much more complex neuronal dynamics. Finally, the PDSAN is trained in conjunction with deep critic networks using the Twin Delayed Deep Deterministic policy gradient algorithm (TD3-PDSAN). Extensive experimental results show that our TD3-PDSAN model achieves better performance than state-of-the-art models on four OpenAI gym benchmark tasks. It is an important attempt to improve RL with SNN towards the effective computation satisfying biological plausibility.
翻译:深神经网络(DNN)是一个强大的功能近似器,深强化学习(DRL)在机器人控制任务上得到了极好的展示。与带有香草人工神经的DNN(DDSAN)相比,生物上可信的Spiking神经网络(SNN)包含多种神经突变人口,使其自然具有以空间和时间信息直接编码每个输入状态组成部分的功能。基于一个混合学习框架,在这个框架下,一个螺钉式的行为者网络推断出来自各州的行动,而一个深层次的批评者网络评价了该行为者,我们建议建立一个人口编码和动态中枢改进了Spiking动作网络(DDSAN),以便从两个不同尺度(输入编码和神经编码)中高效的国家代表。对于输入编码,我们应用动态中枢网络与动态的每个输入状态部分直接编码。对于神经编码来说,我们提出了不同类型的动态中枢(包含1级和2级的神经动态动态动态)来描述更复杂的神经动态动态。最后,PDSAN(DAN模型)在与深入的试判测试中程数据库3号网络一起,利用SDDARDSDS-更精确显示SDIS的进度,实现更精确的进度。