Human trajectory forecasting in crowds, at its core, is a sequence prediction problem with specific challenges of capturing inter-sequence dependencies (social interactions) and consequently predicting socially-compliant multimodal distributions. In recent years, neural network-based methods have been shown to outperform hand-crafted methods on distance-based metrics. However, these data-driven methods still suffer from one crucial limitation: lack of interpretability. To overcome this limitation, we leverage the power of discrete choice models to learn interpretable rule-based intents, and subsequently utilise the expressibility of neural networks to model scene-specific residual. Extensive experimentation on the interaction-centric benchmark TrajNet++ demonstrates the effectiveness of our proposed architecture to explain its predictions without compromising the accuracy.
翻译:人群中的人类轨迹预测,就其核心而言,是一个序列预测问题,其具体挑战在于捕捉关联性依赖性(社会互动),从而预测符合社会要求的多式联运分布。近年来,神经网络方法已证明在远程测量方面优于手工艺方法。然而,这些数据驱动方法仍然受到一个关键限制:缺乏可解释性。为了克服这一限制,我们利用离散选择模型的力量学习可解释的基于规则的意图,并随后利用神经网络的清晰度来模拟特定场景的残留物。关于互动中心基准TrajNet++的广泛实验展示了我们拟议架构在不损害准确性的情况下解释其预测的有效性。