另一个竖直视角: 通过谱学的层次网络预测异构轨迹 (Another Vertical View: A Hierarchical Network for Heterogeneous Trajectory Prediction via Spectrums)

With the fast development of AI-related techniques, the applications of trajectory prediction are no longer limited to easier scenes and trajectories. More and more heterogeneous trajectories with different representation forms, such as 2D or 3D coordinates, 2D or 3D bounding boxes, and even high-dimensional human skeletons, need to be analyzed and forecasted. Among these heterogeneous trajectories, interactions between different elements within a frame of trajectory, which we call the ``Dimension-Wise Interactions'', would be more complex and challenging. However, most previous approaches focus mainly on a specific form of trajectories, which means these methods could not be used to forecast heterogeneous trajectories, not to mention the dimension-wise interaction. Besides, previous methods mostly treat trajectory prediction as a normal time sequence generation task, indicating that these methods may require more work to directly analyze agents' behaviors and social interactions at different temporal scales. In this paper, we bring a new ``view'' for trajectory prediction to model and forecast trajectories hierarchically according to different frequency portions from the spectral domain to learn to forecast trajectories by considering their frequency responses. Moreover, we try to expand the current trajectory prediction task by introducing the dimension $M$ from ``another view'', thus extending its application scenarios to heterogeneous trajectories vertically. Finally, we adopt the bilinear structure to fuse two factors, including the frequency response and the dimension-wise interaction, to forecast heterogeneous trajectories via spectrums hierarchically in a generic way. Experiments show that the proposed model outperforms most state-of-the-art methods on ETH-UCY, Stanford Drone Dataset and nuScenes with heterogeneous trajectories, including 2D coordinates, 2D and 3D bounding boxes.

翻译：随着人工智能相关技术的快速发展，轨迹预测的应用不再局限于更简单的场景和轨迹。越来越多的具有不同表示形式的异构轨迹（例如2D或3D坐标，2D或3D边界框，甚至高维人体骨架）需要进行分析和预测。在这些异构轨迹中，不同帧内不同元素之间的交互，即我们所谓的“按维度交互“，将更加复杂和具有挑战性。但是，大多数以前的方法主要关注特定形式的轨迹，这意味着这些方法不能用于预测异构轨迹，更不用说维度交互了。此外，以前的方法主要将轨迹预测视为正常的时间序列生成任务，这意味着这些方法可能需要更多的工作来直接分析不同时间尺度的代理行为和社交互动。在本文中，我们为轨迹预测带来了一个新的“视角”，根据从频谱领域中不同的频率部分逐层对轨迹进行建模和预测，通过考虑它们的频率响应来学习预测轨迹。此外，我们尝试从“另一个视角”引入维度M，从而将其垂直地将其应用场景扩展到异构轨迹。最后，我们采用双线性结构来融合频率响应和维度交互两个因素，以通用的方式通过谱层次预测异构轨迹。实验表明，所提出的模型在包括2D坐标，2D和3D边界框的异构轨迹的ETH-UCY，Stanford Drone数据集和nuScenes上优于大多数最先进的方法。