Uncertainty in perception, actuation, and the environment often require multiple attempts for a robotic task to be successful. We study a class of problems providing (1) low-entropy indicators of terminal success / failure, and (2) unreliable (high-entropy) data to predict the final outcome of an ongoing task. Examples include a robot trying to connect with a charging station, parallel parking, or assembling a tightly-fitting part. The ability to restart after predicting failure early, versus simply running to failure, can significantly decrease the makespan, that is, the total time to completion, with the drawback of potentially short-cutting an otherwise successful operation. Assuming task running times to be Poisson distributed, and using a Markov Jump process to capture the dynamics of the underlying Markov Decision Process, we derive a closed form solution that predicts makespan based on the confusion matrix of the failure predictor. This allows the robot to learn failure prediction in a production environment, and only adopt a preemptive policy when it actually saves time. We demonstrate this approach using a robotic peg-in-hole assembly problem using a real robotic system. Failures are predicted by a dilated convolutional network based on force-torque data, showing an average makespan reduction from 101s to 81s (N=120, p<0.05). We posit that the proposed algorithm generalizes to any robotic behavior with an unambiguous terminal reward, with wide ranging applications on how robots can learn and improve their behaviors in the wild.
翻译:感知、 触动和环境的不确定性往往要求机器人任务成功需要多次尝试才能成功。 我们研究了一系列问题, 提供:(1) 低脂末期成功/ 失败的指标, 以及(2) 不可靠( 高脂) 数据, 以预测当前任务的最终结果。 例如, 机器人试图与充电站连接, 平行停放, 或组装一个紧凑的部分。 机器人在提前预测失败后重新启动的能力, 而不是简单地运行失败, 能够大大降低 makespan, 也就是说, 完成的时间总的时间, 并有可能缩短一个否则成功的操作。 假设任务运行时间是 Poisson 分布的, 并且使用 Markov 跳跃进程来捕捉马可夫决定进程背后的动态。 我们得出一个封闭的形式解决方案, 以失败预测器的混乱矩阵为基础进行预测。 这让机器人在生产环境中学习失败预测, 并且只有在实际节省时间时才采取先发制政策。 我们用一个真正的机器人比数系统来演示这个方法, 如何将内部组组组组装成一个不精确的游戏, 。