We propose a task we name Portrait Interpretation and construct a dataset named Portrait250K for it. Current researches on portraits such as human attribute recognition and person re-identification have achieved many successes, but generally, they: 1) may lack mining the interrelationship between various tasks and the possible benefits it may bring; 2) design deep models specifically for each task, which is inefficient; 3) may be unable to cope with the needs of a unified model and comprehensive perception in actual scenes. In this paper, the proposed portrait interpretation recognizes the perception of humans from a new systematic perspective. We divide the perception of portraits into three aspects, namely Appearance, Posture, and Emotion, and design corresponding sub-tasks for each aspect. Based on the framework of multi-task learning, portrait interpretation requires a comprehensive description of static attributes and dynamic states of portraits. To invigorate research on this new task, we construct a new dataset that contains 250,000 images labeled with identity, gender, age, physique, height, expression, and posture of the whole body and arms. Our dataset is collected from 51 movies, hence covering extensive diversity. Furthermore, we focus on representation learning for portrait interpretation and propose a baseline that reflects our systematic perspective. We also propose an appropriate metric for this task. Our experimental results demonstrate that combining the tasks related to portrait interpretation can yield benefits. Code and dataset will be made public.
翻译:我们提议了一个名为Portrait解释(Portrait的解释)的任务,并为此建立一个名为Portrait250K的数据集。目前对人类属性识别和个人再识别等肖像学的研究取得了许多成功,但总体上说,这些研究:(1) 可能缺乏各种任务及其可能带来的益处之间的相互关系;(2) 具体为每项任务设计深度模型,这种模型效率低下;(3) 可能无法满足统一模型和实际场景全面认识的需求。在本文件中,拟议的肖像解释从新的系统角度承认人类的看法。我们将肖像的感知分为三个方面,即外观、时装和情感,并为每个方面设计相应的子任务。根据多任务学习框架,肖像解释要求全面描述静态属性和动态肖像状态。为了激励对这一新任务的研究,我们将建立一个包含25万个标有身份、性别、年龄、生理、高度、表达和整个身体和武器姿态的图像的新数据集。我们的数据集从51部收集,从而涵盖广泛的系统化解释,从而设计相应的次级任务。我们还将重点展示我们的公共数据结构,并展示我们这个基本任务。