Clothing segmentation and fine-grained attribute recognition are challenging tasks at the crossing of computer vision and fashion, which segment the entire ensemble clothing instances as well as recognize detailed attributes of the clothing products from any input human images. Many new models have been developed for the tasks in recent years, nevertheless the segmentation accuracy is less than satisfactory in case of layered clothing or fashion products in different scales. In this paper, a new DEtection TRansformer (DETR) based method is proposed to segment and recognize fine-grained attributes of ensemble clothing instances with high accuracy. In this model, we propose a \textbf{multi-layered attention module} by aggregating features of different scales, determining the various scale components of a single instance, and merging them together. We train our model on the Fashionpedia dataset and demonstrate our method surpasses SOTA models in tasks of layered clothing segmentation and fine-grained attribute recognition.
翻译:服装分割和细粒度属性识别是计算机视觉和时尚交叉领域的挑战性任务,它们从任何输入的人类图像中分割整个服装实例并识别服装产品的详细属性。近年来已经开发出许多新模型用于这些任务,然而在多层服装或不同尺度的时尚产品的情况下,分割精度不尽如人意。在本文中,我们提出了一种基于DEtection TRansformer(DETR)的新方法,用于分割和识别高精度的整套服装实例的细粒度属性。在这个模型中,我们通过聚合不同尺度的特征、确定单个实例的各个尺度组件并将它们合并在一起,提出了一个多层注意力模块(multi-layered attention module)。我们在Fashionpedia数据集上训练了我们的模型,并证明我们的方法在分层服装分割和细粒度属性识别任务中超越了现有的最先进模型。