Action understanding matters and attracts attention. It can be formed as the mapping from the action physical space to the semantic space. Typically, researchers built action datasets according to idiosyncratic choices to define classes and push the envelope of benchmarks respectively. Thus, datasets are incompatible with each other like "Isolated Islands" due to semantic gaps and various class granularities, e.g., do housework in dataset A and wash plate in dataset B. We argue that a more principled semantic space is an urgent need to concentrate the community efforts and enable us to use all datasets together to pursue generalizable action learning. To this end, we design a Poincare action semantic space given verb taxonomy hierarchy and covering massive actions. By aligning the classes of previous datasets to our semantic space, we gather (image/video/skeleton/MoCap) datasets into a unified database in a unified label system, i.e., bridging "isolated islands" into a "Pangea". Accordingly, we propose a bidirectional mapping model between physical and semantic space to fully use Pangea. In extensive experiments, our system shows significant superiority, especially in transfer learning. Code and data will be made publicly available.
翻译:摘要:行为理解是非常重要且受到关注的领域。它可以被建立为从行为物理空间到语义空间的映射。通常,研究人员根据独特的选择来定义类别和推进基准来构建行为数据集。因此,由于语义差距和各种类别细粒度,如在数据集 A 中做家务,在数据集 B 中洗碗等,数据集彼此不兼容,就像“孤立岛”一样。我们认为,更有原则的语义空间是迫切需要的,以集中社区的努力并使我们能够将所有数据集一起使用以追求可推广的行为学习。为此,我们设计了一个泊松行为语义空间,给出动词分类层次结构,并涵盖大量的行为。通过将以前数据集的类别对齐到我们的语义空间,我们将(图像/视频/骨架/MoCap)数据集聚集到一个统一的数据库中,使用统一的标签系统,即将“孤立岛”连接成一个“泛地球大陆”。因此,我们提出了一个实现物理空间和语义空间之间双向映射的模型,以充分利用泛地球大陆。在广泛的实验中,我们的系统表现出明显的优越性,特别是在迁移学习中。代码和数据将公开发布。