We present a novel U-Attention vision Transformer for universal texture synthesis. We exploit the natural long-range dependencies enabled by the attention mechanism to allow our approach to synthesize diverse textures while preserving their structures in a single inference. We propose a hierarchical hourglass backbone that attends to the global structure and performs patch mapping at varying scales in a coarse-to-fine-to-coarse stream. Completed by skip connection and convolution designs that propagate and fuse information at different scales, our hierarchical U-Attention architecture unifies attention to features from macro structures to micro details, and progressively refines synthesis results at successive stages. Our method achieves stronger 2$\times$ synthesis than previous work on both stochastic and structured textures while generalizing to unseen textures without fine-tuning. Ablation studies demonstrate the effectiveness of each component of our architecture.
翻译:我们为通用质地合成展示了一个新型的U-注意力愿景变异器。我们利用关注机制所促成的自然长距离依赖性,使我们得以在单个推论中综合各种质地,同时保留其结构。我们提议了一个符合全球结构的等级沙玻璃骨架,并在粗到松到粗的流体中进行不同比例的补丁绘图。我们通过跳过在不同尺度上传播和融合信息的连接和变迁设计而完成的连接和变迁设计,我们的等级U-注意力结构将注意力从宏观结构到微观细节,并逐步完善连续阶段的合成结果。我们的方法比以前关于蒸馏和结构质质体的工作都实现了2美元以上的合成,同时不作微调地概括到看不见质地的质地条。吸收研究显示了我们结构中每个组成部分的有效性。