Long term human motion prediction is essential in safety-critical applications such as human-robot interaction and autonomous driving. In this paper we show that to achieve long term forecasting, predicting human pose at every time instant is unnecessary. Instead, it is more effective to predict a few keyposes and approximate intermediate ones by interpolating the keyposes. We demonstrate that our approach enables us to predict realistic motions for up to 5 seconds in the future, which is far longer than the typical 1 second encountered in the literature. Furthermore, because we model future keyposes probabilistically, we can generate multiple plausible future motions by sampling at inference time. Over this extended time period, our predictions are more realistic, more diverse and better preserve the motion dynamics than those state-of-the-art methods yield.
翻译:长期人类运动预测对于人类机器人相互作用和自主驱动等安全关键应用至关重要。 在本文中,我们显示,为了实现长期预测,没有必要在每时每刻预测人姿势。 相反,通过对关键因素进行内插来预测几个关键因素和近似中间因素比较有效。 我们证明,我们的方法使我们能够预测未来不超过5秒钟的现实动作,远远长于文献中常见的1秒。 此外,因为我们模拟未来关键因素,我们可以通过推断时间进行抽样,产生多种可信的未来动作。 在这段较长的时期内,我们的预测比这些最先进的方法产生的结果更现实、更多样化、更能更好地保存运动动态。