Understanding and anticipating human activity is an important capability for intelligent systems in mobile robotics, autonomous driving, and video surveillance. While learning from demonstrations with on-site collected trajectory data is a powerful approach to discover recurrent motion patterns, generalization to new environments, where sufficient motion data are not readily available, remains a challenge. In many cases, however, semantic information about the environment is a highly informative cue for the prediction of pedestrian motion or the estimation of collision risks. In this work, we infer occupancy priors of human motion using only semantic environment information as input. To this end we apply and discuss a traditional Inverse Optimal Control approach, and propose a novel one based on Convolutional Neural Networks (CNN) to predict future occupancy maps. Our CNN method produces flexible context-aware occupancy estimations for semantically uniform map regions and generalizes well already with small amounts of training data. Evaluated on synthetic and real-world data, it shows superior results compared to several baselines, marking a qualitative step-up in semantic environment assessment.
翻译:理解和预测人类活动是移动机器人、自主驾驶和视频监视中智能系统的重要能力。从现场收集的轨迹数据的演示中学习是发现反复运动模式的有力方法,但向无法随时获得足够运动数据的新环境推广仍然是一个挑战。然而,在许多情况下,关于环境的语义信息是预测行人运动或估计碰撞风险的高度信息提示。在这项工作中,我们仅使用语义环境信息作为投入,推断人类运动的占用前期。为此,我们应用并讨论传统的反优化控制方法,并提议以革命性神经网络(CNN)为基础的新方法来预测未来占用图。我们的CNN方法为语义统一的地图区域生成了灵活的环境认知占用估计,并且已经与少量的培训数据十分接近。对合成和现实世界数据进行了评估,其结果优于若干基线,在语义环境评估中标志着质量上的提升。