Humans leverage the dynamics of the environment and their own bodies to accomplish challenging tasks such as grasping an object while walking past it or pushing off a wall to turn a corner. Such tasks often involve switching dynamics as the robot makes and breaks contact. Learning these dynamics is a challenging problem and prone to model inaccuracies, especially near contact regions. In this work, we present a framework for learning composite dynamical behaviors from expert demonstrations. We learn a switching linear dynamical model with contacts encoded in switching conditions as a close approximation of our system dynamics. We then use discrete-time LQR as the differentiable policy class for data-efficient learning of control to develop a control strategy that operates over multiple dynamical modes and takes into account discontinuities due to contact. In addition to predicting interactions with the environment, our policy effectively reacts to inaccurate predictions such as unanticipated contacts. Through simulation and real world experiments, we demonstrate generalization of learned behaviors to different scenarios and robustness to model inaccuracies during execution.
翻译:人类利用环境的动态和自身身体来完成具有挑战性的任务,比如在绕过天体时抓住物体,或推倒墙壁转向一个角落。这类任务往往涉及机器人制造和中断接触时的转换动态。学习这些动态是一个具有挑战性的问题,容易模拟不准确的情况,特别是在接触区域附近。在这项工作中,我们提出了一个从专家演示中学习综合动态行为的框架。我们学习了在切换条件中编码的联系人的切换线性动态模型,作为我们系统动态的近近近近近。我们然后使用离散时间LQR作为数据高效的控制学习政策类,以开发一种控制战略,以多种动态模式操作,并考虑到接触时的不连续性。除了预测与环境的相互作用外,我们的政策对不准确的预测(如意外接触)作出有效的反应。我们通过模拟和实际的世界实验,我们展示了对不同情景的学习行为的一般化和强力模拟执行过程中的不准确性模型。