ASSET: 高分辨率变形器自动递减的语义场区编辑 (ASSET: Autoregressive Semantic Scene Editing with Transformers at High Resolutions)

We present ASSET, a neural architecture for automatically modifying an input high-resolution image according to a user's edits on its semantic segmentation map. Our architecture is based on a transformer with a novel attention mechanism. Our key idea is to sparsify the transformer's attention matrix at high resolutions, guided by dense attention extracted at lower image resolutions. While previous attention mechanisms are computationally too expensive for handling high-resolution images or are overly constrained within specific image regions hampering long-range interactions, our novel attention mechanism is both computationally efficient and effective. Our sparsified attention mechanism is able to capture long-range interactions and context, leading to synthesizing interesting phenomena in scenes, such as reflections of landscapes onto water or flora consistent with the rest of the landscape, that were not possible to generate reliably with previous convnets and transformer approaches. We present qualitative and quantitative results, along with user studies, demonstrating the effectiveness of our method.

翻译：我们提出ASSET, 这是一种神经结构, 用来根据用户在其语义分隔图上的编辑, 自动修改输入的高分辨率图像。我们的架构基于一个带有新关注机制的变压器。我们的关键想法是用高分辨率将变压器的注意力矩阵放大, 以低图像分辨率的密集关注为指导。虽然先前的注意机制对于处理高分辨率图像来说计算太昂贵,或者在阻碍远程互动的特定图像区域中受到过度限制, 但是我们的新关注机制既具有计算效率和有效性, 也具有计算效率。我们的强化注意机制能够捕捉到远程互动和背景, 导致场景中有趣的现象的合成, 例如与景观其余部分相一致的景观或植物的映像, 而这些现象不可能用以前的孔网和变压器方法可靠地生成。我们提出质和量的结果, 以及用户的研究, 展示了我们方法的有效性。

相关内容

注意力机制

关注 120

Attention机制最早是在视觉图像领域提出来的，但是真正火起来应该算是google mind团队的这篇论文《Recurrent Models of Visual Attention》[14]，他们在RNN模型上使用了attention机制来进行图像分类。随后，Bahdanau等人在论文《Neural Machine Translation by Jointly Learning to Align and Translate》 [1]中，使用类似attention的机制在机器翻译任务上将翻译和对齐同时进行，他们的工作算是是第一个提出attention机制应用到NLP领域中。接着类似的基于attention机制的RNN模型扩展开始应用到各种NLP任务中。最近，如何在CNN中使用attention机制也成为了大家的研究热点。下图表示了attention研究进展的大概趋势。

自然语言处理顶会NAACL2022最佳论文出炉！

专知会员服务

43+阅读 · 2022年6月30日

最新《Transformers模型》教程，64页ppt

专知会员服务

321+阅读 · 2020年11月26日