Self-assessment rules play an essential role in safe and effective real-world robotic applications, which verify the feasibility of the selected action before actual execution. But how to utilize the self-assessment results to re-choose actions remains a challenge. Previous methods eliminate the selected action evaluated as failed by the self-assessment rules, and re-choose one with the next-highest affordance~(i.e. process-of-elimination strategy [1]), which ignores the dependency between the self-assessment results and the remaining untried actions. However, this dependency is important since the previous failures might help trim the remaining over-estimated actions. In this paper, we set to investigate this dependency by learning a failure-aware policy. We propose two architectures for the failure-aware policy by representing the self-assessment results of previous failures as the variable state, and leveraging recurrent neural networks to implicitly memorize the previous failures. Experiments conducted on three tasks demonstrate that our method can achieve better performances with higher task success rates by less trials. Moreover, when the actions are correlated, learning a failure-aware policy can achieve better performance than the process-of-elimination strategy.
翻译:自评规则在安全和有效现实世界机器人应用中发挥着关键作用,在实际执行之前核查选定行动的可行性。但是,如何利用自评结果重新选择行动仍是一个挑战。以前的方法消除了被自评规则视为失败的选定行动,并重新采用次高的自评规则(即消灭过程战略[1]),后者忽视了自我评估结果与其余未尝试的行动之间的依赖性。然而,这种依赖性十分重要,因为以往的失败可能有助于减少其余高估的行动。在本文中,我们设置了通过学习自评政策来调查这种依赖性。我们建议了两种自评失败政策的结构,即将以往失败的自评结果作为变状态,并利用经常性的神经网络来隐含过去失败的记忆。在三项任务上进行的实验表明,我们的方法可以通过较少试验,以更高的任务成功率取得更好的业绩。此外,当行动相互关联时,我们学习自认失败政策可以比进程结束战略取得更好的业绩。</s>