Modeling interaction dynamics to generate robot trajectories that enable a robot to adapt and react to a human's actions and intentions is critical for efficient and effective collaborative Human-Robot Interactions (HRI). Learning from Demonstration (LfD) methods from Human-Human Interactions (HHI) have shown promising results, especially when coupled with representation learning techniques. However, such methods for learning HRI either do not scale well to high dimensional data or cannot accurately adapt to changing via-poses of the interacting partner. We propose Multimodal Interactive Latent Dynamics (MILD), a method that couples deep representation learning and probabilistic machine learning to address the problem of two-party physical HRIs. We learn the interaction dynamics from demonstrations, using Hidden Semi-Markov Models (HSMMs) to model the joint distribution of the interacting agents in the latent space of a Variational Autoencoder (VAE). Our experimental evaluations for learning HRI from HHI demonstrations show that MILD effectively captures the multimodality in the latent representations of HRI tasks, allowing us to decode the varying dynamics occurring in such tasks. Compared to related work, MILD generates more accurate trajectories for the controlled agent (robot) when conditioned on the observed agent's (human) trajectory. Notably, MILD can learn directly from camera-based pose estimations to generate trajectories, which we then map to a humanoid robot without the need for any additional training.
翻译:模拟互动动态以生成机器人的模型互动动态,使机器人能够适应和应对人类的动作和意图。从人类-人类互动(HHHI)的演示(LfD)方法中学习人类-人类互动(HHHI)已经显示出有希望的结果,特别是结合代表性学习技术。然而,这种学习HRI的方法不是与高维数据相适应,就是无法准确地适应互动伙伴的通过空位的变化。我们提议多模式互动互动中低调动态动态(MILD),这是一种将深层代表学习和概率机器学习相结合的方法,对于解决两党的物理HRI问题至关重要。我们从演示(LfD)中学习互动动态,使用隐藏的半马尔科夫模型(HSMMM)来模拟在Variational Autencoder(VAE)潜在空间中联合分配互动代理。我们对从HHHI演示中学习HRI的实验性评估显示,MILD在任何以潜在形式展示中有效捕捉到更多的多式联运,使我们能够解析在这种任务中发生的不同动态。(MLD可以直接从相关操作,而由我们所观测到已观察到的模板。