Learning socially-aware motion representations is at the core of recent advances in multi-agent problems, such as human motion forecasting and robot navigation in crowds. Despite promising progress, existing representations learned with neural networks still struggle to generalize in closed-loop predictions (e.g., output colliding trajectories). This issue largely arises from the non-i.i.d. nature of sequential prediction in conjunction with ill-distributed training data. Intuitively, if the training data only comes from human behaviors in safe spaces, i.e., from "positive" examples, it is difficult for learning algorithms to capture the notion of "negative" examples like collisions. In this work, we aim to address this issue by explicitly modeling negative examples through self-supervision: (i) we introduce a social contrastive loss that regularizes the extracted motion representation by discerning the ground-truth positive events from synthetic negative ones; (ii) we construct informative negative samples based on our prior knowledge of rare but dangerous circumstances. Our method substantially reduces the collision rates of recent trajectory forecasting, behavioral cloning and reinforcement learning algorithms, outperforming state-of-the-art methods on several benchmarks. Our code is available at https://github.com/vita-epfl/social-nce.
翻译:社会觉悟运动表征是最近多试剂问题进展的核心,例如人类运动预测和人群中的机器人导航。尽管取得了可喜的进展,但神经网络的现有表征仍然难以在封闭环状预测(例如,产出对轨图进行校准)中一概而论。 这个问题主要来自非i.i.d.顺序预测的性质以及分布不当的培训数据。 直觉地说,如果培训数据仅来自人类在安全空间的行为,例如“积极”的例子,那么学习算法很难捕捉到“负式”的例子,例如碰撞。 在这项工作中,我们的目标是通过自我监督来明确模拟负面的例子来解决这一问题:(一) 我们引入一种社会对比性损失,通过辨别地面图象的积极事件和合成负面事件来规范所抽取的动作表征;(二) 我们根据我们以前对稀有但危险环境的了解,建立信息化的负面样本。我们的方法大大降低了最近轨迹预测、行为克隆和强化我们现有的算法基准的碰撞率率率率率。