Trajectory prediction is an essential task for successful human robot interaction, such as in autonomous driving. In this work, we address the problem of predicting future pedestrian trajectories in a first person view setting with a moving camera. To that end, we propose a novel action-based contrastive learning loss, that utilizes pedestrian action information to improve the learned trajectory embeddings. The fundamental idea behind this new loss is that trajectories of pedestrians performing the same action should be closer to each other in the feature space than the trajectories of pedestrians with significantly different actions. In other words, we argue that behavioral information about pedestrian action influences their future trajectory. Furthermore, we introduce a novel sampling strategy for trajectories that is able to effectively increase negative and positive contrastive samples. Additional synthetic trajectory samples are generated using a trained Conditional Variational Autoencoder (CVAE), which is at the core of several models developed for trajectory prediction. Results show that our proposed contrastive framework employs contextual information about pedestrian behavior, i.e. action, effectively, and it learns a better trajectory representation. Thus, integrating the proposed contrastive framework within a trajectory prediction model improves its results and outperforms state-of-the-art methods on three trajectory prediction benchmarks [31, 32, 26].
翻译:轨迹预测是人类机器人成功互动(如自主驾驶)的一个基本任务。 在这项工作中,我们用移动相机来应对在第一个人视图设置中预测行人未来行走轨迹的问题。 为此,我们提出一个新的基于行动的对比学习损失,利用行人行动信息来改进所学的轨迹嵌嵌入。 这一新损失背后的基本想法是,行人执行相同动作的轨迹应当比行人轨迹在地貌上更为接近。 换句话说,我们认为行人行为的行为信息会影响行人的未来轨迹。 此外,我们为轨迹引入一种新的取样战略,能够有效增加负和正反向对比样本。 额外的合成轨迹样本是使用经过培训的定质动动动动动动脉动自动编码(CVAE)生成的,这是为轨迹预测开发的若干模型的核心。 结果表明,我们提议的对比框架将行人行为的背景信息,即行动,有效,并且它学习了更好的轨迹表显示。 因此,在模型32中,改进了拟议轨迹预测框架。