Low-level sensory and motor signals in the high-dimensional spaces (e.g., image observations or motor torques) in deep reinforcement learning are complicated to understand or harness for downstream tasks directly. While sensory representations have been widely studied, the representations of actions that form motor skills are yet under exploration. In this work, we find that when a multi-task policy network takes as input states and task embeddings, a space based on the task embeddings emerges to contain meaningful action representations with moderate constraints. Within this space, interpolated or composed embeddings can serve as a high-level interface to instruct the agent to perform meaningful action sequences. Empirical results not only show that the proposed action representations have efficacy for intra-action interpolation and inter-action composition with limited or no learning, but also demonstrate their superior ability in task adaptation to strong baselines in Mujoco locomotion tasks. The evidence elucidates that learning action representations is a promising direction toward efficient, adaptable, and composable RL, forming the basis of abstract action planning and the understanding of motor signal space. Anonymous project page: https://sites.google.com/view/emergent-action-representation/
翻译:深层强化学习中的高维空间(如图像观测或发动机外壳)的低层次感官和运动信号(如图像观测或发动机外壳)很难直接理解或控制下游任务。虽然对感官表现进行了广泛研究,但构成运动技能的行动的表示仍在探索之中。在这项工作中,我们发现,当多任务政策网络以投入状态和任务嵌入作为投入状态和任务嵌入点时,基于任务嵌入的空间将包含有意义的行动表现,但有适度限制。在这个空间中,相互交织或构成的嵌入可以作为高级界面,指示代理人执行有意义的行动序列。情感结果不仅表明提议的行动表现对行动内部的相互调和相互作用构成具有效力,而且学习不多或没有学习,而且还表明它们更有能力根据Mujoco loco 移动任务中的强基线调整任务。证据说明,学习行动表现是朝高效、适应性和可调适度的RL方向的一个有希望的方向,在这个空间中,形成抽象行动规划和了解运动信号空间的基础。匿名项目页: https://site/gosiction-gles/glegentres/signalimmationalmentalmationpalpalpalpalpalpagepact pagepagepact pagepagepact:http:http:http:http://smactpalpalpalpalpalpalpalp:http:https:http:http:http:http://s://smmmmmmationsmmmmmationsmmationalpalpalpalpalpalpalp=/smmtionpmationsmationsmationsmationsmation:http:http:http:http:http:http:http:http:http:http:https://smctionsmactalctionsmp/smp/smp/smpsmpsmactimctionp=mactactactactactactalpalpalpalp=smp/smtionp=smtionp。