In order to consider fashion outfits as aesthetically pleasing, the garments that constitute them need to be compatible in terms of visual aspects, such as style, category and color. With the advent and omnipresence of computer vision deep learning models, increased interest has also emerged for the task of visual compatibility detection with the aim to develop quality fashion outfit recommendation systems. Previous works have defined visual compatibility as a binary classification task with items in a garment being considered as fully compatible or fully incompatible. However, this is not applicable to Outfit Maker applications where users create their own outfits and need to know which specific items may be incompatible with the rest of the outfit. To address this, we propose the Visual InCompatibility TransfORmer (VICTOR) that is optimized for two tasks: 1) overall compatibility as regression and 2) the detection of mismatching items. Unlike previous works that either rely on feature extraction from ImageNet-pretrained models or by end-to-end fine tuning, we utilize fashion-specific contrastive language-image pre-training for fine tuning computer vision neural networks on fashion imagery. Moreover, we build upon the Polyvore outfit benchmark to generate partially mismatching outfits, creating a new dataset termed Polyvore-MISFITs, that is used to train VICTOR. A series of ablation and comparative analyses show that the proposed architecture can compete and even surpass the current state-of-the-art on Polyvore datasets while reducing the instance-wise floating operations by 88%, striking a balance between high performance and efficiency.
翻译:为了将时装视为美观上令人愉快的服装,构成这些服装的服装在风格、类别和颜色等视觉方面需要兼容。随着计算机视觉深度学习模型的出现和无所不在,人们对视觉兼容性检测任务的兴趣也增加了,目的是开发优质时装配置建议系统。以前的工作将视觉兼容性定义为二进制分类任务,认为服装中的项目完全兼容或完全不相容。然而,这不适用于《Exfect Maker》应用程序,因为用户在其中创建了自己的服装,需要知道哪些具体项目可能与服装的其余部分不兼容。为了解决这个问题,我们提议采用视觉不兼容性翻转器(VICtor)来优化两种任务:1)总体兼容性,2)检测不相匹配项目。与以往的工程相比,要么依赖图像网版模型的特征提取,要么通过端对端微调,我们使用时装比性语言模拟前培训来对时装图像的计算机视觉网络进行微调。此外,我们用“视觉变相转换”的运行(VIRC)高性能测试,然后又用“BRRA”模型来进行部分的升级。我们用来进行新的数据比对调。