Gesture as language of non-verbal communication has been theoretically established since the 17th century. However, its relevance for the visual arts has been expressed only sporadically. This may be primarily due to the sheer overwhelming amount of data that traditionally had to be processed by hand. With the steady progress of digitization, though, a growing number of historical artifacts have been indexed and made available to the public, creating a need for automatic retrieval of art-historical motifs with similar body constellations or poses. Since the domain of art differs significantly from existing real-world data sets for human pose estimation due to its style variance, this presents new challenges. In this paper, we propose a novel approach to estimate human poses in art-historical images. In contrast to previous work that attempts to bridge the domain gap with pre-trained models or through style transfer, we suggest semi-supervised learning for both object and keypoint detection. Furthermore, we introduce a novel domain-specific art data set that includes both bounding box and keypoint annotations of human figures. Our approach achieves significantly better results than methods that use pre-trained models or style transfer.
翻译:自17世纪以来,理论上建立了非语言交流语言的定位。然而,它与视觉艺术的相关性只是零星地表现出来。这主要是因为传统上需要手工处理的大量数据。然而,随着数字化的稳步进展,越来越多的历史文物被编制成索引并提供给公众,从而产生了自动检索具有类似体形星座或姿势的艺术历史模型的需要。由于艺术领域与现有的真实世界人类构成估计数据集差异很大,这带来了新的挑战。在本文中,我们提出了在艺术历史图像中估计人构成的新颖的方法。与以往试图用预先培训的模式或通过风格传输弥合领域差距的工作相比,我们建议对对象和关键点检测进行半监督的学习。此外,我们引入了一套新型的域特定艺术数据集,其中既包括约束框,也包括人类数字的关键点描述。我们的方法比使用预先培训的模式或风格传输的方法取得更好的成果。