Many important robotics problems are partially observable in the sense that a single visual or force-feedback measurement is insufficient to reconstruct the state. Standard approaches involve learning a policy over beliefs or observation-action histories. However, both of these have drawbacks; it is expensive to track the belief online, and it is hard to learn policies directly over histories. We propose a method for policy learning under partial observability called the Belief-Grounded Network (BGN) in which an auxiliary belief-reconstruction loss incentivizes a neural network to concisely summarize its input history. Since the resulting policy is a function of the history rather than the belief, it can be executed easily at runtime. We compare BGN against several baselines on classic benchmark tasks as well as three novel robotic touch-sensing tasks. BGN outperforms all other tested methods and its learned policies work well when transferred onto a physical robot.
翻译:许多重要的机器人问题可以部分地观察到,因为单一的视觉或武力回溯测量不足以重建国家。标准方法包括学习关于信仰或观察-行动历史的政策。但是,这两种方法都有缺点;在线跟踪信仰是昂贵的;很难直接了解历史方面的政策。我们提出了一个在部分可观察性下进行政策学习的方法,即信仰-圆形网络(BGN ), 辅助性信仰-重建损失激励神经网络简洁地总结其输入历史。由于由此产生的政策是历史的函数,而不是信仰,因此很容易在运行时执行。我们将BGN与经典基准任务的若干基线以及三种新型机器人触摸-遥感任务进行比较。 BGN 超越了所有其他经过测试的方法,在转移到物理机器人时,它所学的政策效果很好。