使用英文标签的机构、行动和行为 (BABEL: Bodies, Action and Behavior with English Labels)

Understanding the semantics of human movement -- the what, how and why of the movement -- is an important problem that requires datasets of human actions with semantic labels. Existing datasets take one of two approaches. Large-scale video datasets contain many action labels but do not contain ground-truth 3D human motion. Alternatively, motion-capture (mocap) datasets have precise body motions but are limited to a small number of actions. To address this, we present BABEL, a large dataset with language labels describing the actions being performed in mocap sequences. BABEL consists of action labels for about 43 hours of mocap sequences from AMASS. Action labels are at two levels of abstraction -- sequence labels describe the overall action in the sequence, and frame labels describe all actions in every frame of the sequence. Each frame label is precisely aligned with the duration of the corresponding action in the mocap sequence, and multiple actions can overlap. There are over 28k sequence labels, and 63k frame labels in BABEL, which belong to over 250 unique action categories. Labels from BABEL can be leveraged for tasks like action recognition, temporal action localization, motion synthesis, etc. To demonstrate the value of BABEL as a benchmark, we evaluate the performance of models on 3D action recognition. We demonstrate that BABEL poses interesting learning challenges that are applicable to real-world scenarios, and can serve as a useful benchmark of progress in 3D action recognition. The dataset, baseline method, and evaluation code is made available, and supported for academic research purposes at https://babel.is.tue.mpg.de/.

翻译：理解人类运动的语义 -- -- 是什么、如何和为什么运动 -- -- 是一个重要问题,需要用语义标签对人的行动进行数据集。现有的数据集采取两种方法之一。大型视频数据集包含许多动作标签,但不包含地面真相 3D 人类运动。或者,运动抓取(mocap)数据集具有精确的体动,但限于少数行动。为了解决这个问题,我们提供了BABEL,一个包含大量语言标签的大型数据集,用语言标签来描述在软骨序列中进行的行动。BABEL由大约43小时的软骨序列动作标签组成。动作标签在AMASS的两个层次上是抽象的。动作标签分为两个层次 -- -- 动作标签在顺序中描述整体动作标签,但不包含地面图3D 。每个框架标签与moc 序列中相应动作的时间长度一致,多个动作可以重叠。有超过28k的顺序标签和63k框架标签在BABEL, 它属于250多个独特的动作类别。LABEL的动作标记在两个层次上展示了行动, 我们BALLA的标记和基准中显示一个动作的标记。