We formalize decision-making problems in robotics and automated control using continuous MDPs and actions that take place over continuous time intervals. We then approximate the continuous MDP using finer and finer discretizations. Doing this results in a family of systems, each of which has an extremely large action space, although only a few actions are "interesting". We can view the decision maker as being unaware of which actions are "interesting". We can model this using MDPUs, MDPs with unawareness, where the action space is much smaller. As we show, MDPUs can be used as a general framework for learning tasks in robotic problems. We prove results on the difficulty of learning a near-optimal policy in an an MDPU for a continuous task. We apply these ideas to the problem of having a humanoid robot learn on its own how to walk.
翻译:我们利用连续的 MDP 和连续时间间隔的行动来正式确定机器人和自动控制的决策问题。 然后,我们用精细和精细的离散性来比较连续的 MDP 。 这样做的结果是在一系列系统中产生结果, 每一个系统都有非常大的操作空间, 虽然只有少数行动是“ 有趣的 ” 。 我们可以把决策者看成不知道哪些行动是“ 感兴趣的 ” 。 我们可以用MDPU、 无知的 MDP 和动作空间小得多的模型来做这个模型。 正如我们所显示的那样, MDPP 可以用作为在机器人问题上学习任务的一般框架。 我们证明在MDPU 中学习接近最佳的政策难以持续完成。 我们将这些想法应用于让人类机器人自己学习走路的问题。