In this paper, we address the task of semantics-guided image outpainting, which is to complete an image by generating semantically practical content. Different from most existing image outpainting works, we approach the above task by understanding and completing image semantics at the scene graph level. In particular, we propose a novel network of Scene Graph Transformer (SGT), which is designed to take node and edge features as inputs for modeling the associated structural information. To better understand and process graph-based inputs, our SGT uniquely performs feature attention at both node and edge levels. While the former views edges as relationship regularization, the latter observes the co-occurrence of nodes for guiding the attention process. We demonstrate that, given a partial input image with its layout and scene graph, our SGT can be applied for scene graph expansion and its conversion to a complete layout. Following state-of-the-art layout-to-image conversions works, the task of image outpainting can be completed with sufficient and practical semantics introduced. Extensive experiments are conducted on the datasets of MS-COCO and Visual Genome, which quantitatively and qualitatively confirm the effectiveness of our proposed SGT and outpainting frameworks.
翻译:在本文中,我们处理语义制成图像外涂色的任务,即通过生成语义实际内容完成图像,与大多数现有的图像外涂工作不同,我们通过理解和完成现场图形水平的图像语义来完成上述任务。特别是,我们提议建立一个新颖的景象图解变异器网络(SGT),其设计目的是将节点和边缘特征用作模拟相关结构信息的投入。为了更好地理解和处理基于图形的投入,我们的SGT独具特色,在节点和边缘水平上都突出关注。前一种观点将边缘视为关系正规化,而后一种观点则观察到用于指导关注过程的节点共同出现。我们证明,鉴于其布局和场景图中的一部分输入图像,我们的SGT可以应用于场景图扩展及其转换成完整的布局。在进行最新版图转换后,可以完成图像外涂色任务,同时引入充分而实用的语义学。在MS-GGCO和视觉图象框架的数据设置上进行了广泛的实验,从而确认了我们提出的质量和图象框架和图象框架的有效性。