Inverse optimal control methods can be used to characterize behavior in sequential decision-making tasks. Most existing work, however, requires the control signals to be known, or is limited to fully-observable or linear systems. This paper introduces a probabilistic approach to inverse optimal control for stochastic non-linear systems with missing control signals and partial observability that unifies existing approaches. By using an explicit model of the noise characteristics of the sensory and control systems of the agent in conjunction with local linearization techniques, we derive an approximate likelihood for the model parameters, which can be computed within a single forward pass. We evaluate our proposed method on stochastic and partially observable version of classic control tasks, a navigation task, and a manual reaching task. The proposed method has broad applicability, ranging from imitation learning to sensorimotor neuroscience.
翻译:反向最优控制方法可以用于表征序贯决策任务中的行为。然而,大部分现有工作要求控制信号必须已知,或者限于全观察或线性系统。本文介绍了一种针对具有缺失控制信号和部分可观测性的随机非线性系统的概率反向最优控制方法,并统一了现有方法。通过将代理人的感知和控制系统的噪声特征与局部线性化技术结合使用,我们导出了模型参数的近似似然函数,该函数可以在单次前向通过中计算得到。我们对经典控制任务、导航任务和手动伸手任务的随机和部分可观测版本进行了评估。该方法具有广泛的适用性,从模仿学习到感觉-运动神经科学。