Humans have long been recorded in a variety of forms since antiquity. For example, sculptures and paintings were the primary media for depicting human beings before the invention of cameras. However, most current human-centric computer vision tasks like human pose estimation and human image generation focus exclusively on natural images in the real world. Artificial humans, such as those in sculptures, paintings, and cartoons, are commonly neglected, making existing models fail in these scenarios. As an abstraction of life, art incorporates humans in both natural and artificial scenes. We take advantage of it and introduce the Human-Art dataset to bridge related tasks in natural and artificial scenarios. Specifically, Human-Art contains 50k high-quality images with over 123k person instances from 5 natural and 15 artificial scenarios, which are annotated with bounding boxes, keypoints, self-contact points, and text information for humans represented in both 2D and 3D. It is, therefore, comprehensive and versatile for various downstream tasks. We also provide a rich set of baseline results and detailed analyses for related tasks, including human detection, 2D and 3D human pose estimation, image generation, and motion transfer. As a challenging dataset, we hope Human-Art can provide insights for relevant research and open up new research questions.
翻译:从古代以来,人类就以各种形式被记录下来。例如,在相机发明之前,雕塑和绘画是描绘人类的主要媒介。然而,当前大部分以人类为中心的计算机视觉任务,如人体姿态估计和人物图像生成,仅专注于真实世界中的自然图像。常常忽略艺术中的人造人类,如雕塑、绘画和卡通人物,使现有模型在这些场景中失效。作为对生命的抽象,艺术包含自然和人造场景中的人类。我们利用这一点,引入了Human-Art数据集来连接自然和人造场景中的相关任务。具体而言,Human-Art包含50k张高质量图像和超过123k个人类实例,这些实例来自于5个自然和15个人造场景,并用边界框、关键点、自接触点和文本信息注释了2D和3D中的人类。因此,Human-Art对于各种下游任务来说具有全面性和通用性。我们还提供了丰富的基线结果和详细的分析,包括人类检测、2D和3D人体姿态估计、图像生成和运动转移等相关任务。作为一个具有挑战性的数据集,我们希望Human-Art可以为相关研究提供见解并开启新的研究问题。