维顿-HD:通过不匹配-意识正常化进行高分辨率虚拟尝试 (VITON-HD: High-Resolution Virtual Try-On via Misalignment-Aware Normalization)

The task of image-based virtual try-on aims to transfer a target clothing item onto the corresponding region of a person, which is commonly tackled by fitting the item to the desired body part and fusing the warped item with the person. While an increasing number of studies have been conducted, the resolution of synthesized images is still limited to low (e.g., 256x192), which acts as the critical limitation against satisfying online consumers. We argue that the limitation stems from several challenges: as the resolution increases, the artifacts in the misaligned areas between the warped clothes and the desired clothing regions become noticeable in the final results; the architectures used in existing methods have low performance in generating high-quality body parts and maintaining the texture sharpness of the clothes. To address the challenges, we propose a novel virtual try-on method called VITON-HD that successfully synthesizes 1024x768 virtual try-on images. Specifically, we first prepare the segmentation map to guide our virtual try-on synthesis, and then roughly fit the target clothing item to a given person's body. Next, we propose ALIgnment-Aware Segment (ALIAS) normalization and ALIAS generator to handle the misaligned areas and preserve the details of 1024x768 inputs. Through rigorous comparison with existing methods, we demonstrate that VITON-HD highly surpasses the baselines in terms of synthesized image quality both qualitatively and quantitatively. Code is available at https://github.com/shadow2496/VITON-HD.

翻译：以图像为基础的虚拟试镜任务旨在将一个目标衣物项目转移到一个人的相应区域,通常通过将物品与期望的体部部分相配和将扭曲的物品与个人混合来解决。虽然已经进行了越来越多的研究,但合成图像的解析仍然限于低水平(例如256x192),这对满足在线消费者来说是关键限制。我们争辩说,限制源于若干挑战:随着决议的增加,扭曲的衣服与理想的服装区域之间不匹配的成品在最终结果中变得十分明显;现有方法中使用的建筑在生成高质量身体部件和保持服装的纹理敏锐性方面表现较差。为了应对挑战,我们提出了名为VITON-HD的新型虚拟试镜方法,成功合成了1024x768虚拟试镜。具体地说,我们首先绘制了分解图,以指导我们的虚拟试镜合成,然后将目标服装项目大致适合给定的人的身体。接下来,我们提议在生成高品质-Award-Avale部分(ALIAS24) 和高质量-AHDRIS 标准中,我们通过现有的10ral-ral-ral-ralex-ralex-ralexxxxxxxxx