Reactive motion generation problems are usually solved by computing actions as a sum of policies. However, these policies are independent of each other and thus, they can have conflicting behaviors when summing their contributions together. We introduce Composable Energy Policies (CEP), a novel framework for modular reactive motion generation. CEP computes the control action by optimization over the product of a set of stochastic policies. This product of policies will provide a high probability to those actions that satisfy all the components and low probability to the others. Optimizing over the product of the policies avoids the detrimental effect of conflicting behaviors between policies choosing an action that satisfies all the objectives. Besides, we show that CEP naturally adapts to the Reinforcement Learning problem allowing us to integrate, in a hierarchical fashion, any distribution as prior, from multimodal distributions to non-smooth distributions and learn a new policy given them.
翻译:动态运动生成问题通常通过计算行动作为政策的总和来解决。然而,这些政策彼此独立,因此,在将它们的贡献相提并论时,它们可能会有相互冲突的行为。我们引入了复合能源政策(CEP),这是模块化反应动作生成的新框架。CEP通过优化一套随机政策的产物来计算控制行动。这一政策产物将为那些满足所有组成部分和低概率的动作提供很大的可能性。优化政策产品避免了政策选择满足所有目标的行动之间相互冲突的行为的有害影响。此外,我们表明,CEP自然会适应强化学习问题,使我们能够以等级化的方式整合从多式联运分配到非移动分布的任何分配,并学习给予它们的新政策。