We present a novel approach for the visual prediction of human-object interactions in videos. Rather than forecasting the human and object motion or the future hand-object contact points, we aim at predicting (a)the class of the on-going human-object interaction and (b) the class(es) of the next active object(s) (NAOs), i.e., the object(s) that will be involved in the interaction in the near future as well as the time the interaction will occur. Graph matching relies on the efficient Graph Edit distance (GED) method. The experimental evaluation of the proposed approach was conducted using two well-established video datasets that contain human-object interactions, namely the MSR Daily Activities and the CAD120. High prediction accuracy was obtained for both action prediction and NAO forecasting.
翻译:我们提出了在视频中以视觉方式预测人类与物体相互作用的新办法。我们不是预测人与物体运动或未来的人工与物体接触点,而是要预测(a) 进行中的人类与物体相互作用的类别,(b) 下一个活动对象(NAOs)的类别,即近期内将参与相互作用的物体,以及相互作用发生的时间。图表比对取决于高效的图形编辑距离(GED)方法。对拟议方法的实验性评价是使用两个成熟的视频数据集进行的,其中包括人与物体的相互作用,即MSR每日活动和CAD120。对行动预测和NAO预测都取得了很高的预测准确性。