Despite recent advances in object detection using deep learning neural networks, these neural networks still struggle to identify objects in art images such as paintings and drawings. This challenge is known as the cross depiction problem and it stems in part from the tendency of neural networks to prioritize identification of an object's texture over its shape. In this paper we propose and evaluate a process for training neural networks to localize objects - specifically people - in art images. We generate a large dataset for training and validation by modifying the images in the COCO dataset using AdaIn style transfer. This dataset is used to fine-tune a Faster R-CNN object detection network, which is then tested on the existing People-Art testing dataset. The result is a significant improvement on the state of the art and a new way forward for creating datasets to train neural networks to process art images.
翻译:尽管最近利用深层学习神经网络在物体探测方面取得了进展,但这些神经网络仍在努力在绘画和图画等艺术图像中辨别物体,这一挑战被称为交叉描述问题,其部分原因是神经网络倾向于在形状上优先鉴定物体的纹理。在本文件中,我们提议并评价一个培训神经网络的过程,以便在艺术图像中将物体(特别是人)本地化。我们通过使用AdaIn风格传输修改COCO数据集中的图像,为培训和验证生成了一个庞大的数据集。这个数据集被用来微调一个更快的R-CNN物体探测网络,然后在现有的PeopleArt测试数据集上进行测试。结果大大改进了艺术状态,并提出了创建数据集以训练神经网络处理艺术图像的新方法。