Aspect ratio and spatial layout are two of the principal factors determining the aesthetic value of a photograph. But, incorporating these into the traditional convolution-based frameworks for the task of image aesthetics assessment is problematic. The aspect ratio of the photographs gets distorted while they are resized/cropped to a fixed dimension to facilitate training batch sampling. On the other hand, the convolutional filters process information locally and are limited in their ability to model the global spatial layout of a photograph. In this work, we present a two-stage framework based on graph neural networks and address both these problems jointly. First, we propose a feature-graph representation in which the input image is modelled as a graph, maintaining its original aspect ratio and resolution. Second, we propose a graph neural network architecture that takes this feature-graph and captures the semantic relationship between the different regions of the input image using visual attention. Our experiments show that the proposed framework advances the state-of-the-art results in aesthetic score regression on the Aesthetic Visual Analysis (AVA) benchmark.
翻译:光谱比例和空间布局是确定照片审美价值的两个主要因素。 但是,将这些照片纳入传统的图像审美评估任务基于革命的框架是有问题的。 照片的侧面比例在调整规模/编成固定尺寸以方便批量抽样培训时被扭曲。 另一方面, 进化过滤器在当地处理信息, 其模拟全球照片空间布局的能力有限。 在这项工作中, 我们提出了一个基于图形神经网络的两阶段框架, 并共同解决这些问题。 首先, 我们提出了一个地貌图示, 将输入图像模拟成图表, 保持其原始的侧面比例和分辨率。 其次, 我们提出一个图形神经网络结构, 以视觉关注方式使用这种特征图谱并捕捉输入图像不同区域之间的语义关系。 我们的实验显示, 拟议的框架提高了在美学视觉分析基准上的美学得分回归结果。