The availability of large-scale video action understanding datasets has facilitated advances in the interpretation of visual scenes containing people. However, learning to recognise human actions and their social interactions in an unconstrained real-world environment comprising numerous people, with potentially highly unbalanced and long-tailed distributed action labels from a stream of sensory data captured from a mobile robot platform remains a significant challenge, not least owing to the lack of a reflective large-scale dataset. In this paper, we introduce JRDB-Act, as an extension of the existing JRDB, which is captured by a social mobile manipulator and reflects a real distribution of human daily-life actions in a university campus environment. JRDB-Act has been densely annotated with atomic actions, comprises over 2.8M action labels, constituting a large-scale spatio-temporal action detection dataset. Each human bounding box is labeled with one pose-based action label and multiple~(optional) interaction-based action labels. Moreover JRDB-Act provides social group annotation, conducive to the task of grouping individuals based on their interactions in the scene to infer their social activities~(common activities in each social group). Each annotated label in JRDB-Act is tagged with the annotators' confidence level which contributes to the development of reliable evaluation strategies. In order to demonstrate how one can effectively utilise such annotations, we develop an end-to-end trainable pipeline to learn and infer these tasks, i.e. individual action and social group detection. The data and the evaluation code is publicly available at https://jrdb.erc.monash.edu/.
翻译:大规模视频行动理解数据集的可用性促进了对包含人的视觉场景的解读。然而,在由众多人组成的不受限制的现实世界环境中,学习承认人类行动及其社会互动,而从移动机器人平台获取的感官数据流中,可能高度不平衡和长尾分布的动作标签,这仍是一个重大挑战,尤其是由于缺乏一个反映性大规模数据集,我们在此文件中引入JRDB-Act,作为现有JRDB的延伸,该JRDB-Act是社会流动操纵器的捕捉,反映大学校园环境中人类日常生活行动的真实分布。JRDB-Ac 以原子行动为高度,由超过2.8M动作标签组成,构成大规模波形时空动作检测数据集。每个人类捆绑框都标有一种基于表面的动作标签和多个(可选)基于互动的动作标签。此外,JRDB-AC提供社会群体的评估,有助于根据他们在现场的交互互动情况对个人进行分组任务进行分类。JRD-Ac在每组中,每个组织都展示一个最新的社交活动。