Biological and artificial agents need to deal with constant changes in the real world. We study this problem in four classical continuous control environments, augmented with morphological perturbations. Learning to locomote when the length and the thickness of different body parts vary is challenging, as the control policy is required to adapt to the morphology to successfully balance and advance the agent. We show that a control policy based on the proprioceptive state performs poorly with highly variable body configurations, while an (oracle) agent with access to a learned encoding of the perturbation performs significantly better. We introduce DMAP, a biologically-inspired, attention-based policy network architecture. DMAP combines independent proprioceptive processing, a distributed policy with individual controllers for each joint, and an attention mechanism, to dynamically gate sensory information from different body parts to different controllers. Despite not having access to the (hidden) morphology information, DMAP can be trained end-to-end in all the considered environments, overall matching or surpassing the performance of an oracle agent. Thus DMAP, implementing principles from biological motor control, provides a strong inductive bias for learning challenging sensorimotor tasks. Overall, our work corroborates the power of these principles in challenging locomotion tasks.
翻译:生物和人工制剂需要应对真实世界的不断变化。 我们用四种古典连续控制环境来研究这一问题, 并辅之以形态扰动。 当身体各部分的长度和厚度不同时, 学习在滚动, 具有挑战性, 因为控制政策需要适应形态学, 以便成功地平衡和推进物剂。 我们显示基于自我感知状态的控制政策在高度变异的体形配置下表现不佳, 而能够接触经学习的扰动编码的( oracle) 剂则表现得更好。 我们引入了DMAP, 一种生物激发的、 关注型政策网络架构。 DMAP 将独立自觉处理、 与每个联合部位的单个控制器的分散政策以及关注机制结合起来, 以动态方式将感官信息从不同身体各部分传到不同的控制器。 尽管无法接触( 隐蔽的) 形态信息, 但DMAP可以在所有考虑的环境中接受端到端端端端的训练, 总体匹配或超过一种或触摸物剂的性能。 因此, DMAP, 执行生物运动控制的原则, 具有挑战性感官的整个任务。