通过模拟抽象目标预测下一步行动 (Predicting the Next Action by Modeling the Abstract Goal)

The problem of anticipating human actions is an inherently uncertain one. However, we can reduce this uncertainty if we have a sense of the goal that the actor is trying to achieve. Here, we present an action anticipation model that leverages goal information for the purpose of reducing the uncertainty in future predictions. Since we do not possess goal information or the observed actions during inference, we resort to visual representation to encapsulate information about both actions and goals. Through this, we derive a novel concept called abstract goal which is conditioned on observed sequences of visual features for action anticipation. We design the abstract goal as a distribution whose parameters are estimated using a variational recurrent network. We sample multiple candidates for the next action and introduce a goal consistency measure to determine the best candidate that follows from the abstract goal. Our method obtains impressive results on the very challenging Epic-Kitchens55 (EK55), EK100, and EGTEA Gaze+ datasets. We obtain absolute improvements of +13.69, +11.24, and +5.19 for Top-1 verb, Top-1 noun, and Top-1 action anticipation accuracy respectively over prior state-of-the-art methods for seen kitchens (S1) of EK55. Similarly, we also obtain significant improvements in the unseen kitchens (S2) set for Top-1 verb (+10.75), noun (+5.84) and action (+2.87) anticipation. Similar trend is observed for EGTEA Gaze+ dataset, where absolute improvement of +9.9, +13.1 and +6.8 is obtained for noun, verb, and action anticipation. It is through the submission of this paper that our method is currently the new state-of-the-art for action anticipation in EK55 and EGTEA Gaze+ https://competitions.codalab.org/competitions/20071#results Code available at https://github.com/debadityaroy/Abstract_Goal

翻译：预测人类行动的问题本质上是一个不确定的问题。但是, 如果我们对行为者正在试图达到的目标有某种感知, 我们就可以减少这种不确定性。在这里, 我们展示了一个行动预测模型, 利用目标信息来减少未来预测中的不确定性。由于我们没有目标信息或者在推断过程中观察到的行动, 我们使用视觉表达方式来包罗关于行动和目标的信息。通过这个概念, 我们产生了一个叫做抽象的目标, 以观察到的直观功能序列为条件, 以行动预期的视觉特征为条件。我们设计了一个抽象目标, 其参数是使用变异经常性网络估算的分布。我们为下一个行动抽样选择选取多个候选人, 并引入一个目标一致性衡量标准, 以根据抽象目标确定最佳候选人。我们的方法在极具挑战性的 Epic-Kitchens 55 (EK55, EK100, EGTTEA Gaze+数据集 ) 上取得了令人印象深刻的结果。我们得到的是, AS =13.69, +11.24, 和 +5.19 用于Top-1 ver、 Top- 1 nonal 和 Stop-1 预估测的 EVERC 预估测地, 我们的S- 55S- s