Human behavior understanding with unmanned aerial vehicles (UAVs) is of great significance for a wide range of applications, which simultaneously brings an urgent demand of large, challenging, and comprehensive benchmarks for the development and evaluation of UAV-based models. However, existing benchmarks have limitations in terms of the amount of captured data, types of data modalities, categories of provided tasks, and diversities of subjects and environments. Here we propose a new benchmark - UAVHuman - for human behavior understanding with UAVs, which contains 67,428 multi-modal video sequences and 119 subjects for action recognition, 22,476 frames for pose estimation, 41,290 frames and 1,144 identities for person re-identification, and 22,263 frames for attribute recognition. Our dataset was collected by a flying UAV in multiple urban and rural districts in both daytime and nighttime over three months, hence covering extensive diversities w.r.t subjects, backgrounds, illuminations, weathers, occlusions, camera motions, and UAV flying attitudes. Such a comprehensive and challenging benchmark shall be able to promote the research of UAV-based human behavior understanding, including action recognition, pose estimation, re-identification, and attribute recognition. Furthermore, we propose a fisheye-based action recognition method that mitigates the distortions in fisheye videos via learning unbounded transformations guided by flat RGB videos. Experiments show the efficacy of our method on the UAV-Human dataset. The project page: https://github.com/SUTDCV/UAV-Human
翻译:与无人驾驶飞行器(无人驾驶飞行器)的人类行为理解对于广泛的应用具有非常重要的意义,这些应用同时为开发和评价基于无人驾驶飞行器的模型提出了庞大、富有挑战性和全面基准的迫切需要,但现有基准在所采集的数据数量、数据模式类型、所提供的任务类别以及主题和环境的多样性方面都存在局限性。我们在这里提议了一个新的基准——无人驾驶飞行器(UAVHHR),用于与无人驾驶飞行器的人行为理解,其中包括67,428个多式视频序列和119个行动识别主题,22,476个配置估计框架,41,290个框架和1,144个个人再识别身份的特征,22,263个属性识别框架。我们的数据集是通过在三个月的白天和夜间在多个城市和农村地区由飞行的无人驾驶飞行器收集的数据,从而涵盖广泛的多样性主题、背景、污点、天气、紫外线、紫外线运动、摄影机运动以及UAVHR的飞行态度。这种全面和具有挑战性的基准将能够促进基于无人驾驶飞行器的人类行为定义的研究,其中包括通过行动识别、改变鱼类活动的方法,从而显示我们通过行动认识、分析方法,并显示我们通过行动的方法,从而显示,从而显示我们如何认识。