We propose a novel planning technique for satisfying tasks specified in temporal logic in partially revealed environments. We define high-level actions derived from the environment and the given task itself, and estimate how each action contributes to progress towards completing the task. As the map is revealed, we estimate the cost and probability of success of each action from images and an encoding of that action using a trained neural network. These estimates guide search for the minimum-expected-cost plan within our model. Our learned model is structured to generalize across environments and task specifications without requiring retraining. We demonstrate an improvement in total cost in both simulated and real-world experiments compared to a heuristic-driven baseline.
翻译:我们提出一种新的规划技术,用于在部分暴露环境中完成时间逻辑规定的任务。我们界定了源自环境和特定任务本身的高级别行动,并估计了每项行动如何有助于完成任务的进展。地图显示,我们用经过培训的神经网络从图像中估算了每项行动的成本和成功概率,并用经过培训的神经网络对这些行动进行了编码。这些估算指导了在模型中寻找最低预期成本计划。我们所学习的模型的结构是,在不要求再培训的情况下,将各种环境和任务规格加以归纳。我们显示,模拟和现实世界实验的总成本与由超自然驱动的基线相比都有所提高。