SHIFT15M: 与数个分布移位相匹配的成套到集集的时装特定数据集</s> (SHIFT15M: Fashion-specific dataset for set-to-set matching with several distribution shifts)

This paper addresses the problem of set-to-set matching, which involves matching two different sets of items based on some criteria, especially in the case of high-dimensional items like images. Although neural networks have been applied to solve this problem, most machine learning-based approaches assume that the training and test data follow the same distribution, which is not always true in real-world scenarios. To address this limitation, we introduce SHIFT15M, a dataset that can be used to evaluate set-to-set matching models when the distribution of data changes between training and testing. We conduct benchmark experiments that demonstrate the performance drop of naive methods due to distribution shift. Additionally, we provide software to handle the SHIFT15M dataset in a simple manner, with the URL for the software to be made available after publication of this manuscript. We believe proposed SHIFT15M dataset provide a valuable resource for evaluating set-to-set matching models under the distribution shift.

翻译：本文讨论了设置到设置的匹配问题,这涉及根据某些标准对两组不同的项目进行匹配,特别是在图像等高维项目的情况下。虽然已经应用神经网络解决这一问题,但大多数基于机器学习的方法假定培训和测试数据采用同样的分布,在现实世界的情景中并不总是如此。为了应对这一限制,我们引入了SHIFT15M数据集,该数据集可用于在培训和测试之间分配数据变化时对设置到设置的匹配模型进行评估。我们进行了基准实验,以显示由于分布转移而导致天真方法的性能下降。此外,我们提供了软件,以简单的方式处理SHIFT15M数据集,并在出版这一手稿后提供软件的URL。我们认为,拟议的SHIFT15M数据集为在分布转移下评价设置到设置的匹配模型提供了宝贵的资源。</s>

相关内容

数据集

关注 88

数据集，又称为资料集、数据集合或资料集合，是一种由数据所组成的集合。
Data set（或dataset）是一个数据的集合，通常以表格形式出现。每一列代表一个特定变量。每一行都对应于某一成员的数据集的问题。它列出的价值观为每一个变量，如身高和体重的一个物体或价值的随机数。每个数值被称为数据资料。对应于行数，该数据集的数据可能包括一个或多个成员。

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

专知会员服务

76+阅读 · 2022年6月28日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

81+阅读 · 2020年7月26日

零样本文本分类，Zero-Shot Learning for Text Classification

专知会员服务

97+阅读 · 2020年5月31日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

166+阅读 · 2020年3月18日