Multi-pedestrian trajectory prediction is an indispensable safety element of autonomous systems that interact with crowds in unstructured environments. Many recent efforts have developed trajectory prediction algorithms with focus on understanding social norms behind pedestrian motions. Yet we observe these works usually hold two assumptions that prevent them from being smoothly applied to robot applications: positions of all pedestrians are consistently tracked; the target agent pays attention to all pedestrians in the scene. The first assumption leads to biased interaction modeling with incomplete pedestrian data, and the second assumption introduces unnecessary disturbances and leads to the freezing robot problem. Thus, we propose Gumbel Social Transformer, in which an Edge Gumbel Selector samples a sparse interaction graph of partially observed pedestrians at each time step. A Node Transformer Encoder and a Masked LSTM encode the pedestrian features with the sampled sparse graphs to predict trajectories. We demonstrate that our model overcomes the potential problems caused by the assumptions, and our approach outperforms the related works in benchmark evaluation.
翻译:多角度轨迹预测是同无结构环境中人群互动的自主系统不可或缺的安全要素。最近许多努力已经开发了轨迹预测算法,重点是了解行人运动背后的社会规范。然而,我们观察这些工程通常有两种假设,以防止它们顺利应用于机器人应用:所有行人的位置都得到一致的跟踪;目标物剂关注现场所有行人。第一个假设导致与不完整行人数据进行偏颇的模拟互动,而第二个假设则引入不必要的扰动并导致冷冻机器人问题。因此,我们提议Gumbel Social Tranger,其中Edge Gumbel Speator为每步一个部分观察行人取样一个稀少的互动图。一个诺德变换码器和一个蒙面的LSTM,将行人特征与抽样的稀疏图编码,以预测轨迹。我们证明我们的模型克服了假设可能造成的潜在问题,我们的方法超过了基准评估的相关工作。