In this paper, we present self-supervised shared latent embedding (S3LE), a data-driven motion retargeting method that enables the generation of natural motions in humanoid robots from motion capture data or RGB videos. While it requires paired data consisting of human poses and their corresponding robot configurations, it significantly alleviates the necessity of time-consuming data-collection via novel paired data generating processes. Our self-supervised learning procedure consists of two steps: automatically generating paired data to bootstrap the motion retargeting, and learning a projection-invariant mapping to handle the different expressivity of humans and humanoid robots. Furthermore, our method guarantees that the generated robot pose is collision-free and satisfies position limits by utilizing nonparametric regression in the shared latent space. We demonstrate that our method can generate expressive robotic motions from both the CMU motion capture database and YouTube videos.
翻译:在本文中,我们展示了自我监督的共享潜伏嵌入(S3LE),这是一种数据驱动的运动再定位方法,能够通过运动捕获数据或 RGB 视频生成人类机器人的自然运动。虽然它需要由人造外形及其相应的机器人配置组成的对称数据,但它大大减轻了通过新式对称数据生成程序收集耗时数据的必要性。我们的自我监督的学习程序由两步组成:自动生成对称数据以诱导运动重新定位,并学习投影-异变图解图,以处理人类和人造机器人的不同直观性。此外,我们的方法通过在共享的潜伏空间使用非对称回归法来保证所生成的机器人构成的无碰撞和满足位置限制。我们证明我们的方法能够从 CMU 动作捕获数据库和YouTube 视频中产生表达的机器人动作动作。