Our goal is to develop theory and algorithms for establishing fundamental limits on performance for a given task imposed by a robot's sensors. In order to achieve this, we define a quantity that captures the amount of task-relevant information provided by a sensor. Using a novel version of the generalized Fano inequality from information theory, we demonstrate that this quantity provides an upper bound on the highest achievable expected reward for one-step decision making tasks. We then extend this bound to multi-step problems via a dynamic programming approach. We present algorithms for numerically computing the resulting bounds, and demonstrate our approach on three examples: (i) the lava problem from the literature on partially observable Markov decision processes, (ii) an example with continuous state and observation spaces corresponding to a robot catching a freely-falling object, and (iii) obstacle avoidance using a depth sensor with non-Gaussian noise. We demonstrate the ability of our approach to establish strong limits on achievable performance for these problems by comparing our upper bounds with achievable lower bounds (computed by synthesizing or learning concrete control policies).
翻译:我们的目标是发展理论和算法,为机器人传感器强加的任务的性能设定基本限值。 为了实现这一目标,我们定义一个数量,捕捉传感器提供的任务相关信息的数量。 使用信息理论中普遍法诺不平等的新型版本,我们证明这一数量为一步骤决策任务提供了最高可实现的最高预期奖赏。 然后,我们通过动态编程方法将这一数量限值扩大到多步问题。 我们提出数字计算结果限值的算法,并用三个例子展示我们的方法:(一) 部分观测到马尔科夫决策过程的文献中的熔岩问题,(二) 持续状态和观测空间与机器人自由撞击物体相对应的实例,以及(三) 使用带有非高加索噪音的深度传感器避免障碍。我们通过将我们的上限值与可实现的低限值进行比较(通过合成或学习具体控制政策),我们的方法能够为这些问题的可实现性能设定严格的限度。