Action understanding matters and attracts attention. It can be formed as the mapping from the action physical space to the semantic space. Typically, researchers built action datasets according to idiosyncratic choices to define classes and push the envelope of benchmarks respectively. Thus, datasets are incompatible with each other like "Isolated Islands" due to semantic gaps and various class granularities, e.g., do housework in dataset A and wash plate in dataset B. We argue that a more principled semantic space is an urgent need to concentrate the community efforts and enable us to use all datasets together to pursue generalizable action learning. To this end, we design a Poincare action semantic space given verb taxonomy hierarchy and covering massive actions. By aligning the classes of previous datasets to our semantic space, we gather (image/video/skeleton/MoCap) datasets into a unified database in a unified label system, i.e., bridging "isolated islands" into a "Pangea". Accordingly, we propose a bidirectional mapping model between physical and semantic space to fully use Pangea. In extensive experiments, our system shows significant superiority, especially in transfer learning. Code and data will be made publicly available.
翻译:动作理解是一个具有吸引力的领域。它可以被视为将动作的物理空间映射到语义空间。通常,研究者通过采用自己的选择来定义类别,并推动基准数据的发展,因此构建了不兼容的动作数据集, 像孤立的岛屿一样由于语义差距和各种类别的不同粒度(例如在数据集 A 中做家务,而在数据集 B 中洗碗)。我们认为,更加有原则性的语义空间是迫切需要的,以集中社区的努力,并使我们能够将所有数据集一起使用以追求可推广的动作学习。为此,我们设计了基于动词分类层次结构和包含大量动作的 Poincaré 动作语义空间。通过将以前数据集的类别与我们的语义空间进行对齐,我们在统一的标签系统中将(图像/视频/骨架/MoCap)数据集汇集成了一个统一的数据库,即将“孤立岛”连接成“Pangea”。因此,我们提出了一个物理空间和语义空间之间的双向映射模型,以充分利用 Pangea。在广泛的实验中,我们的系统表现出显著的优势,特别是在迁移学习方面。代码和数据将公开提供。