布局BERT: 用于对象插入的隐藏语言布局模型 (LayoutBERT: Masked Language Layout Model for Object Insertion)

Image compositing is one of the most fundamental steps in creative workflows. It involves taking objects/parts of several images to create a new image, called a composite. Currently, this process is done manually by creating accurate masks of objects to be inserted and carefully blending them with the target scene or images, usually with the help of tools such as Photoshop or GIMP. While there have been several works on automatic selection of objects for creating masks, the problem of object placement within an image with the correct position, scale, and harmony remains a difficult problem with limited exploration. Automatic object insertion in images or designs is a difficult problem as it requires understanding of the scene geometry and the color harmony between objects. We propose LayoutBERT for the object insertion task. It uses a novel self-supervised masked language model objective and bidirectional multi-head self-attention. It outperforms previous layout-based likelihood models and shows favorable properties in terms of model capacity. We demonstrate the effectiveness of our approach for object insertion in the image compositing setting and other settings like documents and design templates. We further demonstrate the usefulness of the learned representations for layout-based retrieval tasks. We provide both qualitative and quantitative evaluations on datasets from diverse domains like COCO, PublayNet, and two new datasets which we call Image Layouts and Template Layouts. Image Layouts which consists of 5.8 million images with layout annotations is the largest image layout dataset to our knowledge. We also share ablation study results on the effect of dataset size, model size and class sample size for this task.

翻译：图像合成是创造工作流程中最基本的步骤之一。它涉及将若干图像中的对象/ 部分用于创建新图像, 称为复合。目前, 此进程是通过创建要插入的物体的准确掩码, 并小心地将其与目标场景或图像混在一起, 通常是在Photoshop 或 GIMP等工具的帮助下进行。虽然在自动选择用于创建掩码的对象方面做了几次工作, 但是将对象放置在图像中的正确位置、比例和和谐仍然是一个困难的问题。图像或设计中的自动对象插入是一个困难问题, 因为它需要理解屏幕几何形状和对象之间的颜色和谐。我们为对象插入任务插入任务添加了布局布局的颜色和颜色协调。我们进一步展示了我们所学习的图像布局结构的实用性, 并且从基于模型的图像布局上层和设计模板等其它设置的图像布局, 我们从此图像布局上的数据布局上, 提供了我们所学习的图像布局上最大的图像布局, 我们从两个图像布局上, 提供了我们所学习的图像布局上的数据布局上的数据布局图, 。我们从两个数据布局上, 提供数据布局上的数据布局上的数据布局上, 我们提供数据布局上的数据布局上的数据布局图图图上的数据布局上, 。

相关内容

MoDELS

关注 43

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

166+阅读 · 2020年3月18日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

96+阅读 · 2020年3月12日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

【深度学习表格检测、信息提取和结构化】《Table Detection, Information Extraction and Structuring using Deep Learning》by Vihar Kurama

专知会员服务

38+阅读 · 2020年1月23日