Human activity understanding is of widespread interest in artificial intelligence and spans diverse applications like health care and behavior analysis. Although there have been advances with deep learning, it remains challenging. The object recognition-like solutions usually try to map pixels to semantics directly, but activity patterns are much different from object patterns, thus hindering another success. In this work, we propose a novel paradigm to reformulate this task in two-stage: first mapping pixels to an intermediate space spanned by atomic activity primitives, then programming detected primitives with interpretable logic rules to infer semantics. To afford a representative primitive space, we build a knowledge base including 26+ M primitive labels and logic rules from human priors or automatic discovering. Our framework, Human Activity Knowledge Engine (HAKE), exhibits superior generalization ability and performance upon canonical methods on challenging benchmarks. Code and data are available at http://hake-mvig.cn/.
翻译:人类活动理解对人工智能有着广泛的兴趣,并涉及多种应用,如保健和行为分析。虽然在深层学习方面有所进步,但仍然具有挑战性。目标识别类解决方案通常试图直接映射像素到语义学,但活动模式与物体模式大不相同,从而阻碍另一个成功。在这项工作中,我们提出一个新的模式,将这项任务改写为两个阶段:首先绘制像素到由原子活动原始体所覆盖的中间空间,然后编程探测到原始生物,用可解释的逻辑规则来推断语义学。为了提供具有代表性的原始空间,我们建立了一个知识库,包括26+ M 原始标签和逻辑规则,来自人类前科或自动发现。我们的框架,人类活动知识引擎(Hake),展示了在具有挑战性的基准上的精通度方法上的超强的通用能力和性能。代码和数据见http://hake-mvig.cn/。