在深海强化学习部分可观测性下进行示范 (Agent Modelling under Partial Observability for Deep Reinforcement Learning)

Modelling the behaviours of other agents is essential for understanding how agents interact and making effective decisions. Existing methods for agent modelling commonly assume knowledge of the local observations and chosen actions of the modelled agents during execution. To eliminate this assumption, we extract representations from the local information of the controlled agent using encoder-decoder architectures. Using the observations and actions of the modelled agents during training, our models learn to extract representations about the modelled agents conditioned only on the local observations of the controlled agent. The representations are used to augment the controlled agent's decision policy which is trained via deep reinforcement learning; thus, during execution, the policy does not require access to other agents' information. We provide a comprehensive evaluation and ablations studies in cooperative, competitive and mixed multi-agent environments, showing that our method achieves higher returns than baseline methods which do not use the learned representations.

翻译：模拟其他代理商的行为对于了解代理商如何相互作用和作出有效决定至关重要; 现有的代理商模拟方法通常假定了解当地观察和在执行期间模拟代理商所选择的行动; 为了消除这一假设,我们利用编码器-编码器结构从受控代理商的当地信息中提取说明; 利用模拟代理商在培训期间的观察和行动,我们的模型学会仅仅根据受控代理商的当地观察来提取关于示范代理商的表示; 使用这些说明来增强受控代理商通过深层加固学习所培训的决策政策; 因此,在执行期间,该政策不需要获取其他代理商的信息; 我们在合作、竞争和混合的多代理商环境中提供综合评价和推理研究,表明我们的方法比不使用所了解的表述的基线方法获得更高的回报率。

相关内容

MoDELS

关注 43

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

深度学习优化算法，73页ppt，Optimization Algorithms on Deep Learning

专知会员服务

135+阅读 · 2021年6月16日