For fashion outfits to be considered aesthetically pleasing, the garments that constitute them need to be compatible in terms of visual aspects, such as style, category and color. Previous works have defined visual compatibility as a binary classification task with items in a garment being considered as fully compatible or fully incompatible. However, this is not applicable to Outfit Maker applications where users create their own outfits and need to know which specific items may be incompatible with the rest of the outfit. To address this, we propose the Visual InCompatibility TransfORmer (VICTOR) that is optimized for two tasks: 1) overall compatibility as regression and 2) the detection of mismatching items and utilize fashion-specific contrastive language-image pre-training for fine tuning computer vision neural networks on fashion imagery. We build upon the Polyvore outfit benchmark to generate partially mismatching outfits, creating a new dataset termed Polyvore-MISFITs, that is used to train VICTOR. A series of ablation and comparative analyses show that the proposed architecture can compete and even surpass the current state-of-the-art on Polyvore datasets while reducing the instance-wise floating operations by 88%, striking a balance between high performance and efficiency. We release our code at https://github.com/stevejpapad/Visual-InCompatibility-Transformer
翻译:为了让时装被视为美观,构成时装服装的服装需要在视觉方面兼容,例如风格、类别和颜色。 以前的作品将视觉兼容性定义为一种二进制分类任务, 服装中的项目被视为完全兼容或完全不兼容。 但是, 这对于“ Exfitfit Maker” 应用程序不适用, 用户在其中创建自己的服装, 需要知道哪些具体项目可能与服装的其余部分不相容。 为了解决这个问题, 我们提议了视觉不兼容性转换器(VICtor), 它在以下两个任务中最优化:1) 总体兼容性,作为回归和2) 检测不匹配项目,并使用时装对比性对比性语言模拟预培训,以对时装图像上的计算机视觉网络进行微调。 我们用聚变装基准来产生部分不相配配对, 创建一个新的数据集, 称为“ 聚变换- MISFITTOR ” 。 一系列的对比和比较分析显示, 拟议的结构可以竞争甚至超过当前状态- 艺术的回归性项目, 2) 利用时装式模拟/ 数据发布效率, 降低 ASyvoreabreabreal 。