In this work, we point out that the major dilemma of image aesthetics assessment (IAA) comes from the abstract nature of aesthetic labels. That is, a vast variety of distinct contents can correspond to the same aesthetic label. On the one hand, during inference, the IAA model is required to relate various distinct contents to the same aesthetic label. On the other hand, when training, it would be hard for the IAA model to learn to distinguish different contents merely with the supervision from aesthetic labels, since aesthetic labels are not directly related to any specific content. To deal with this dilemma, we propose to distill knowledge on semantic patterns for a vast variety of image contents from multiple pre-trained object classification (POC) models to an IAA model. Expecting the combination of multiple POC models can provide sufficient knowledge on various image contents, the IAA model can easier learn to relate various distinct contents to a limited number of aesthetic labels. By supervising an end-to-end single-backbone IAA model with the distilled knowledge, the performance of the IAA model is significantly improved by 4.8% in SRCC compared to the version trained only with ground-truth aesthetic labels. On specific categories of images, the SRCC improvement brought by the proposed method can achieve up to 7.2%. Peer comparison also shows that our method outperforms 10 previous IAA methods.
翻译:在这项工作中,我们指出,图像美学评估(IAA)的主要两难困境来自审美标签的抽象性质。也就是说,各种各样的不同内容可以与同一审美标签相对应。一方面,在推断过程中,需要AA模型将不同内容与同一审美标签联系起来。另一方面,培训时,IA模型很难学会仅仅通过审美标签的监管区分不同内容,因为审美标签与任何具体内容没有直接关系。为了应对这一难题,我们提议从多个预先训练的物体分类(POC)模型中为大量种类的图像内容保留关于语义学模式的知识。期待多个POC模型的结合能够就各种图像内容提供充分的知识。在培训过程中,IAA模型更容易学会将不同内容与数量有限的美学标签联系起来,因为审美标签与任何特定内容没有直接关联。为了应对这一难题,我们建议将AASA模型的性能通过4.8%的SRCC模型与一个经过培训的版本相比,也只能通过一种经过具体版本的SAICSBS格式,通过一种经过培训的校准的版本,将SBIAC模型比照了一种特定的版本。