DynAST: Exemplar- Guided 图像生成的动态 Sprassy 变换器 (DynaST: Dynamic Sparse Transformer for Exemplar-Guided Image Generation)

One key challenge of exemplar-guided image generation lies in establishing fine-grained correspondences between input and guided images. Prior approaches, despite the promising results, have relied on either estimating dense attention to compute per-point matching, which is limited to only coarse scales due to the quadratic memory cost, or fixing the number of correspondences to achieve linear complexity, which lacks flexibility. In this paper, we propose a dynamic sparse attention based Transformer model, termed Dynamic Sparse Transformer (DynaST), to achieve fine-level matching with favorable efficiency. The heart of our approach is a novel dynamic-attention unit, dedicated to covering the variation on the optimal number of tokens one position should focus on. Specifically, DynaST leverages the multi-layer nature of Transformer structure, and performs the dynamic attention scheme in a cascaded manner to refine matching results and synthesize visually-pleasing outputs. In addition, we introduce a unified training objective for DynaST, making it a versatile reference-based image translation framework for both supervised and unsupervised scenarios. Extensive experiments on three applications, pose-guided person image generation, edge-based face synthesis, and undistorted image style transfer, demonstrate that DynaST achieves superior performance in local details, outperforming the state of the art while reducing the computational cost significantly. Our code is available at https://github.com/Huage001/DynaST

翻译：Exmplar 制导图像生成的关键挑战之一是在投入和制导图像之间建立细微的对应关系。尽管取得了有希望的成果, 先前的方法依靠的是估算对计算点匹配的密集关注, 但由于二次存储成本, 计算点匹配的注意程度仅限于粗略的尺度, 但由于二次存储成本, 仅限于粗略的尺度, 或固定通信数量以实现线性复杂, 缺乏灵活性。在本文中, 我们提议基于变异器的动态分散关注模式, 称为动态 Sparse 变异器( Dynart ), 以达到优于效率的微调匹配。我们的方法的核心是一个新的动态关注单位, 专门用来覆盖对一个位置所关注的标志的最佳数量进行修改。具体地说, DynatST 利用变异频结构的多层性质, 以连锁化关注机制来完善匹配结果, 并合成 DynalST, 为其监管和未校准的情景提供一个基于多功能的参考图像翻译框架。在三种应用程序上进行广泛的实验, 配置导制人/ 将高端图像转换为我们高端/ 的图像转换的图像,, 在高端的计算中, 以大幅转换中, 将我们现有的图像转换为高端/ 的图像转换为可操作的图像转换的图像转换成。