Accurate prediction of future person location and movement trajectory from an egocentric wearable camera can benefit a wide range of applications, such as assisting visually impaired people in navigation, and the development of mobility assistance for people with disability. In this work, a new egocentric dataset was constructed using a wearable camera, with 8,250 short clips of a targeted person either walking 1) toward, 2) away, or 3) across the camera wearer in indoor environments, or 4) staying still in the scene, and 13,817 person bounding boxes were manually labelled. Apart from the bounding boxes, the dataset also contains the estimated pose of the targeted person as well as the IMU signal of the wearable camera at each time point. An LSTM-based encoder-decoder framework was designed to predict the future location and movement trajectory of the targeted person in this egocentric setting. Extensive experiments have been conducted on the new dataset, and have shown that the proposed method is able to reliably and better predict future person location and trajectory in egocentric videos captured by the wearable camera compared to three baselines.
翻译:准确预测未来人的位置和从以自我为中心的磨损照相机运动轨迹,可以有利于广泛的应用,例如协助视障人员航行,以及为残疾人开发移动辅助设备。在这项工作中,使用可磨损的照相机建造了一个新的以自我为中心的数据集,目标人员有8 250个短片,在室内环境中向1人、2人或3人行走,或隔着照相机在室内环境走过1人、2人或3人行走,或4人仍停留在现场,13 817个捆绑盒是手工贴上标签的。除了捆绑盒外,数据集还包含目标人员的估计姿势以及每个时间点的可磨损相机的IMU信号。一个基于LSTM的编码解码器框架旨在预测目标人员在这种以自我为中心的环境中的未来位置和运动轨迹。在新的数据集上进行了广泛的实验,并表明拟议的方法能够可靠和更好地预测未来人的位置以及由可磨损照相机拍摄到的以自我为中心的录像中的自我中心视频的轨迹,而不是三个基线。