Learning an effective outfit-level representation is critical for predicting the compatibility of items in an outfit, and retrieving complementary items for a partial outfit. We present a framework, OutfitTransformer, that uses the proposed task-specific tokens and leverages the self-attention mechanism to learn effective outfit-level representations encoding the compatibility relationships between all items in the entire outfit for addressing both compatibility prediction and complementary item retrieval tasks. For compatibility prediction, we design an outfit token to capture a global outfit representation and train the framework using a classification loss. For complementary item retrieval, we design a target item token that additionally takes the target item specification (in the form of a category or text description) into consideration. We train our framework using a proposed set-wise outfit ranking loss to generate a target item embedding given an outfit, and a target item specification as inputs. The generated target item embedding is then used to retrieve compatible items that match the rest of the outfit. Additionally, we adopt a pre-training approach and a curriculum learning strategy to improve retrieval performance. Since our framework learns at an outfit-level, it allows us to learn a single embedding capturing higher-order relations among multiple items in the outfit more effectively than pairwise methods. Experiments demonstrate that our approach outperforms state-of-the-art methods on compatibility prediction, fill-in-the-blank, and complementary item retrieval tasks. We further validate the quality of our retrieval results with a user study.
翻译:学习有效的装配级别代表对于预测装配中物品的兼容性至关重要,而获取部分装配的补充项目则至关重要。我们提出了一个框架,即“Exfit Transformex”,使用拟议的特定任务标志,并利用自我注意机制学习有效的装配级别代表,将整个装配中所有项目之间的兼容性关系编码起来,以处理兼容性预测和补充项目检索任务。关于兼容性预测,我们设计一个装配象征,用分类损失来捕捉全球装配代表并培训框架。对于补充项目检索来说,我们设计了一个目标项目标记,以便额外考虑目标项目规格(以类别或文本说明的形式)。我们用一个拟议的定置式排序损失来培训我们的框架,以生成一个嵌入一个特定装配装配的物品和目标项目规格作为投入。然后,将生成的目标项目嵌入用于检索匹配其余物品的兼容性项目。此外,我们采用了一种培训前方法和课程学习战略来改进检索业绩。由于我们的框架在结构层次上学习了更多的,因此,我们能够学习一种单一的嵌入项目,在多套装式的用户质量预测中,从而有效地展示了我们的系统化方法。