We introduce the first benchmark for a new problem --- recognizing human action adverbs (HAA): "Adverbs Describing Human Actions" (ADHA). This is the first step for computer vision to change over from pattern recognition to real AI. We demonstrate some key features of ADHA: a semantically complete set of adverbs describing human actions, a set of common, describable human actions, and an exhaustive labeling of simultaneously emerging actions in each video. We commit an in-depth analysis on the implementation of current effective models in action recognition and image captioning on adverb recognition, and the results show that such methods are unsatisfactory. Moreover, we propose a novel three-stream hybrid model to deal the HAA problem, which achieves a better result.
翻译:我们为一个新问题引入了第一个基准 -- -- 承认人类行动副词(HAA) -- -- 承认人类行动副词(ADHA)。这是计算机愿景从模式识别向真正的AI转变的第一步。我们展示了ADHA的一些关键特征:一套精密完整的副词,描述人类行动,一套共同的、可亵渎的人类行动,并在每部视频中详细标出同时出现的行动。我们深入分析了当前有效行动模型在识别和描述副词识别图示方面的执行情况,结果显示这些方法不尽人意。此外,我们提出了一个新的三流混合模型,处理HAAA问题,取得更好的结果。