In this paper, we address the problem of forecasting the trajectory of an egocentric camera wearer (ego-person) in crowded spaces. The trajectory forecasting ability learned from the data of different camera wearers walking around in the real world can be transferred to assist visually impaired people in navigation, as well as to instill human navigation behaviours in mobile robots, enabling better human-robot interactions. To this end, a novel egocentric human trajectory forecasting dataset was constructed, containing real trajectories of people navigating in crowded spaces wearing a camera, as well as extracted rich contextual data. We extract and utilize three different modalities to forecast the trajectory of the camera wearer, i.e., his/her past trajectory, the past trajectories of nearby people, and the environment such as the scene semantics or the depth of the scene. A Transformer-based encoder-decoder neural network model, integrated with a novel cascaded cross-attention mechanism that fuses multiple modalities, has been designed to predict the future trajectory of the camera wearer. Extensive experiments have been conducted, and the results have shown that our model outperforms the state-of-the-art methods in egocentric human trajectory forecasting.
翻译:在本文中,我们探讨了在拥挤的空间预测一个以自我为中心的照相机磨损机(ego-person)的轨迹的问题;从现实世界中不同摄影机磨损机的数据中得出的轨迹预测能力可以转让,以帮助航行中的视力受损者,以及在移动机器人中灌输人类导航行为,从而能够改善人类-机器人的互动;为此,建立了一个以自我为中心的人类轨迹预测数据集,其中包含了在拥挤的空间中穿戴相机的人的真实轨迹,以及提取丰富的背景数据;我们提取和使用三种不同的模式来预测摄影机的轨迹,即:他/她的过去轨迹、附近人过去的轨迹以及环境,例如现场的语义学或场深度。一个基于变异器的编码器-脱coder神经网络模型,与一种新的级联动多种模式的跨关注机制相结合,目的是预测摄影机的未来轨迹。进行了广泛的实验,结果显示,我们模型的自我轨道偏离了状态。