Language allows humans to build mental models that interpret what is happening around them resulting in more accurate long-term predictions. We present a novel trajectory prediction model that uses linguistic intermediate representations to forecast trajectories, and is trained using trajectory samples with partially annotated captions. The model learns the meaning of each of the words without direct per-word supervision. At inference time, it generates a linguistic description of trajectories which captures maneuvers and interactions over an extended time interval. This generated description is used to refine predictions of the trajectories of multiple agents. We train and validate our model on the Argoverse dataset, and demonstrate improved accuracy results in trajectory prediction. In addition, our model is more interpretable: it presents part of its reasoning in plain language as captions, which can aid model development and can aid in building confidence in the model before deploying it.
翻译:语言可以让人类建立解释周围所发生事情的心理模型,从而产生更准确的长期预测。 我们提出了一个新的轨迹预测模型,使用语言中间表示来预测轨迹,并用带有部分注释说明的轨迹样本进行培训。 该模型在没有直接的字词监督的情况下学习了每个字的含义。 在推论时间里,它生成了对轨迹的语言描述,在较长的时段内捕捉动作和互动。 生成的描述被用来改进对多个物剂轨迹的预测。 我们在阿尔戈弗数据集上培训和验证我们的模型,并在轨迹预测中显示更好的准确性结果。 此外,我们的模型更容易被解释:它用浅白的语言表述了部分推理,可以帮助模型的开发,并有助于在部署模型之前建立对模型的信心。