Gesture as \enquote*{language} of non-verbal communication has been theoretically established since the 17th century. However, its relevance for the visual arts has been expressed only sporadically. This may be primarily due to the sheer overwhelming amount of data that traditionally had to be processed by hand. With the steady progress of digitization, though, a growing number of historical artifacts have been indexed and made available to the public, creating a need for automatic retrieval of art-historical motifs with similar body constellations or poses. Since the domain of art differs significantly from existing real-world data sets for human pose estimation due to its style variance, this presents new challenges. In this paper, we propose a novel approach to estimate human poses in art-historical images. In contrast to previous work that attempts to bridge the domain gap with pre-trained models or through style transfer, we suggest semi-supervised learning for both object and keypoint detection. Furthermore, we introduce a novel domain-specific art data set that includes both bounding box and keypoint annotations of human figures. Our approach achieves significantly better results than methods that use pre-trained models or style transfer.
翻译:自17世纪以来,非语言通信的手势在理论上已经建立起来。然而,它与视觉艺术的相关性只是零星地表现出来。这主要是因为传统上需要手工处理的大量数据。然而,随着数字化的稳步进展,越来越多的历史文物已经编制成索引并提供给公众,因此有必要自动检索具有类似体形星座或姿势的艺术历史图案。由于艺术领域与现有的真实世界人类构成估计数据集差异很大,这带来了新的挑战。在本文件中,我们提出了一种新颖的方法来估计艺术历史图像中的人类构成。与以往试图用预先训练的模型或通过风格传输弥合领域差距的工作相比,我们建议对对象和关键点的探测进行半监督性学习。此外,我们引入了一套新的特定域的艺术数据集,其中既包括约束框,也包括人类数字的关键点描述。我们的方法比使用预先训练模型或风格传输的方法取得了更好的结果。