In this paper we address the task of recognizing assembly actions as a structure (e.g. a piece of furniture or a toy block tower) is built up from a set of primitive objects. Recognizing the full range of assembly actions requires perception at a level of spatial detail that has not been attempted in the action recognition literature to date. We extend the fine-grained activity recognition setting to address the task of assembly action recognition in its full generality by unifying assembly actions and kinematic structures within a single framework. We use this framework to develop a general method for recognizing assembly actions from observation sequences, along with observation features that take advantage of a spatial assembly's special structure. Finally, we evaluate our method empirically on two application-driven data sources: (1) An IKEA furniture-assembly dataset, and (2) A block-building dataset. On the first, our system recognizes assembly actions with an average framewise accuracy of 70% and an average normalized edit distance of 10%. On the second, which requires fine-grained geometric reasoning to distinguish between assemblies, our system attains an average normalized edit distance of 23% -- a relative improvement of 69% over prior work.
翻译:在本文中,我们处理的是将组装行动确认为一个结构(例如,一个家具或玩具块塔)的任务,从一组原始物体中建立起来。认识到所有组装行动需要以空间细节层面的认知,但迄今为止在行动识别文献中尚未尝试过。我们扩展细微区分活动识别设置,通过在一个单一框架内统一组装行动和运动结构,将组装行动确认为全部一般任务。我们利用这个框架开发一种一般方法,用以确认组装行动从观察序列中得出,以及利用空间组装特殊结构的观测特征。最后,我们从经验角度评估了两种应用驱动数据源的方法:(1) IKEA家具组装数据集,和(2)块建数据集。首先,我们系统确认组装行动的平均框架精度为70%,平均整齐度为10%。第二,需要精确的几何推法来区分组装之间,我们的系统实现了23%的平均归正化距离,比先前工作改进了69%。