Predicting human motion behavior in a crowd is important for many applications, ranging from the natural navigation of autonomous vehicles to intelligent security systems of video surveillance. All the previous works model and predict the trajectory with a single resolution, which is rather inefficient and difficult to simultaneously exploit the long-range information (e.g., the destination of the trajectory), and the short-range information (e.g., the walking direction and speed at a certain time) of the motion behavior. In this paper, we propose a temporal pyramid network for pedestrian trajectory prediction through a squeeze modulation and a dilation modulation. Our hierarchical framework builds a feature pyramid with increasingly richer temporal information from top to bottom, which can better capture the motion behavior at various tempos. Furthermore, we propose a coarse-to-fine fusion strategy with multi-supervision. By progressively merging the top coarse features of global context to the bottom fine features of rich local context, our method can fully exploit both the long-range and short-range information of the trajectory. Experimental results on several benchmarks demonstrate the superiority of our method. Our code and models will be available upon acceptance.
翻译:预测人群中的人类运动行为对于许多应用都很重要,从自主飞行器的自然导航到智能的视频监视安全系统。所有以前的工程模型都以单一分辨率预测轨迹,这种分辨率效率相当低,而且很难同时利用远程信息(例如轨道的目的地)和短程信息(例如行走方向和某一时刻的速度),对人群中的短期信息(例如行走方向和速度)都很重要。在本文件中,我们提议通过挤压调节和变相调节,建立一个时间金字塔网络,用于行人轨迹预测。我们的等级框架构建了一个特征金字塔,其从上到下的时间信息越来越丰富,可以更好地捕捉到各种节奏的运动行为。此外,我们提议采用多视野的粗略至松动战略。通过逐步将全球背景的最粗糙的特征与富富富地方环境的底细特征结合起来,我们的方法可以充分利用轨迹的长程和短程信息。几个基准的实验结果显示了我们方法的优越性。我们的代码和模型在被接受后可以使用。