Predictive models are effective in reasoning about human motion, a crucial part that affects safety and efficiency in human-robot interaction. However, robots often lack access to certain key parameters of such models, for example, human's objectives, their level of distraction, and willingness to cooperate. Dual control theory addresses this challenge by treating unknown parameters as stochastic hidden states and identifying their values using information gathered during control of the robot. Despite its ability to optimally and automatically trade off exploration and exploitation, dual control is computationally intractable for general human-in-the-loop motion planning, mainly due to nested trajectory optimization and human intent prediction. In this paper, we present a novel algorithmic approach to enable active uncertainty learning for human-in-the-loop motion planning based on the implicit dual control paradigm. Our approach relies on sampling-based approximation of stochastic dynamic programming, leading to a model predictive control problem that can be readily solved by real-time gradient-based optimization methods. The resulting policy is shown to preserve the dual control effect for generic human predictive models with both continuous and categorical uncertainty. The efficacy of our approach is demonstrated with simulated driving examples.
翻译:人类运动是影响人类机器人相互作用安全和效率的一个关键部分,人类运动的推理是预测性模型的有效方法,但机器人往往缺乏获得这种模型某些关键参数的机会,例如人类的目标、分散其注意力的程度以及合作的意愿。双重控制理论通过将未知参数作为随机隐蔽的隐蔽状态来应对这一挑战,并利用在机器人控制期间收集的信息来查明其价值。尽管它有能力最佳和自动地交换探索和开发,但双重控制在计算上难以用于一般的人类流动规划,这主要是由于嵌套轨迹优化和人类意图预测。在本文件中,我们提出了一个新型的算法方法,以便根据隐含的双重控制范式,积极学习人类流动规划的不确定性。我们的方法依赖于基于抽样的随机近似性动态动态编程,从而导致一个模型预测控制问题,可以通过实时的梯度优化方法很容易解决。由此形成的政策证明,可以保持具有连续和绝对不确定性的人类通用预测模型的双重控制效果。我们的方法的功效通过模拟驱动示例来证明。