The ability to accurately predict the opponent's behavior is central to the safety and efficiency of robotic systems in interactive settings, such as human-robot interaction and multi-robot teaming tasks. Unfortunately, robots often lack access to key information on which these predictions may hinge, such as opponent's goals, attention, and willingness to cooperate. Dual control theory addresses this challenge by treating unknown parameters of a predictive model as hidden states and inferring their values at runtime using information gathered during system operation. While able to optimally and automatically trade off exploration and exploitation, dual control is computationally intractable for general interactive motion planning. In this paper, we present a novel algorithmic approach to enable active uncertainty reduction for interactive motion planning based on the implicit dual control paradigm. Our approach relies on sampling-based approximation of stochastic dynamic programming, leading to a model predictive control problem. The resulting policy is shown to preserve the dual control effect for a broad class of predictive models with both continuous and categorical uncertainty. To ensure the safe operation of the interacting agents, we leverage a supervisory control scheme, oftentimes referred to as ``shielding'', which overrides the ego agent's dual control policy with a safety fallback strategy when a safety-critical event is imminent. We then augment the dual control framework with an improved variant of the recently proposed shielding-aware robust planning scheme, which proactively balances the nominal planning performance with the risk of high-cost emergency maneuvers triggered by low-probability opponent's behaviors. We demonstrate the efficacy of our approach with both simulated driving examples and hardware experiments using 1/10 scale autonomous vehicles.
翻译:准确预测对手行为的能力是互动环境中机器人系统安全和效率的核心,例如人-机器人互动和多机器人团队化任务。 不幸的是,机器人往往缺乏获取这些预测所依赖的关键信息的途径,例如对手的目标、注意力和合作意愿。双重控制理论通过将预测模型的未知参数作为隐藏状态处理这一挑战,并利用系统运行期间收集的信息在运行时推断其价值。虽然能够优化和自动交换探索和开发,但双重控制在计算上难以实现一般互动运动规划。在本文件中,我们提出了一个新的算法方法,以便能够根据隐含的双重控制模式,积极减少互动运动规划的不确定性。我们的方法依赖于基于抽样的同步动态编程近似,从而导致一个模型预测控制问题。由此形成的政策可以保持对具有连续和明确不确定性的大规模预测模型进行双重控制的双重控制效果。为确保互动代理器的安全操作,我们利用监管控制机制,通常被称作 ASshielelel 动作规划中的交互式动态动作规划。我们的方法依赖于基于当前双向的双重自我风险规划策略,而我们又用一种更高的成本控制模式的快速操作模式,我们用一种更高的成本控制模式来取代了一种快速的双重风险预算。