Every day, humans perform many closely related activities that involve subtle discriminative motions, such as putting on a shirt vs. putting on a jacket, or shaking hands vs. giving a high five. Activity recognition by ethical visual AI could provide insights into our patterns of daily life, however existing activity recognition datasets do not capture the massive diversity of these human activities around the world. To address this limitation, we introduce Collector, a free mobile app to record video while simultaneously annotating objects and activities of consented subjects. This new data collection platform was used to curate the Consented Activities of People (CAP) dataset, the first large-scale, fine-grained activity dataset of people worldwide. The CAP dataset contains 1.45M video clips of 512 fine grained activity labels of daily life, collected by 780 subjects in 33 countries. We provide activity classification and activity detection benchmarks for this dataset, and analyze baseline results to gain insight into how people around with world perform common activities. The dataset, benchmarks, evaluation tools, public leaderboards and mobile apps are available for use at visym.github.io/cap.
翻译:人类每天从事许多密切相关的活动,涉及微妙的歧视性动议,如穿上衬衫与穿上夹克,或握手与给予高五分位。 道德视觉AI对活动的认识可以使我们的日常生活模式有洞察力,但现有的活动识别数据集并不能捕捉到世界各地这些人类活动的巨大多样性。为了应对这一限制,我们引入了一个免费的移动应用程序Charmer,即一个免费的移动应用程序,用于记录视频,同时说明目标及同意对象的活动。这个新的数据收集平台被用于整理人们的自愿活动数据集,即全世界第一套大型、精细活动数据集。 CAP数据集包含1.45M视频剪辑,其中512个细细细的日常生活活动标签由33个国家的780个主体收集。我们为该数据集提供活动分类和活动检测基准,并分析基线结果,以深入了解世界各地人们如何开展共同活动。数据集、基准、评估工具、公共领导板和移动应用程序可供用户在ym.github./cap中使用。