Human life is populated with articulated objects. Current Category-level Articulation Pose Estimation (CAPE) methods are studied under the single-instance setting with a fixed kinematic structure for each category. Considering these limitations, we reform this problem setting for real-world environments and suggest a CAPE-Real (CAPER) task setting. This setting allows varied kinematic structures within a semantic category, and multiple instances to co-exist in an observation of real world. To support this task, we build an articulated model repository ReArt-48 and present an efficient dataset generation pipeline, which contains Fast Articulated Object Modeling (FAOM) and Semi-Authentic MixEd Reality Technique (SAMERT). Accompanying the pipeline, we build a large-scale mixed reality dataset ReArtMix and a real world dataset ReArtVal. We also propose an effective framework ReArtNOCS that exploits RGB-D input to estimate part-level pose for multiple instances in a single forward pass. Extensive experiments demonstrate that the proposed ReArtNOCS can achieve good performance on both CAPER and CAPE settings. We believe it could serve as a strong baseline for future research on the CAPER task.
翻译:人类生活由分解对象组成。 目前, 分类层次的分层分层切换光学( CAPE) 方法在单层设置下研究, 每种类别都有固定的动态结构。 考虑到这些局限性, 我们为现实世界环境改革这一问题设置, 并建议使用 CAPE- Real( CAPER) 任务设置。 这种设置允许在语义类别中存在不同的运动结构, 并允许在真实世界的观察中同时存在多个实例。 为了支持这项任务, 我们建立了一个分层的模型存储库 ReArt- 48, 并展示了一个高效的数据集生成管道, 其中包括快速配方物体模型(FAOM) 和半配方混合真实技术(SAMERT ) 。 在管道中, 我们建立一个大规模混合的现实数据集设置 ReArtMix 和真实的世界数据集 ReArtVal 。 我们还提议一个有效的框架, 利用 RGB- D 输入来估计一个前方通道中的多度场景。 广泛的实验表明, 拟议的RArtNOCS 能够实现一个强大的CAPER 和CAPE 任务设置上的良好业绩。