Instance discriminative self-supervised representation learning has been attracted attention thanks to its unsupervised nature and informative feature representation for downstream tasks. In practice, it commonly uses a larger number of negative samples than the number of supervised classes. However, there is an inconsistency in the existing analysis; theoretically, a large number of negative samples degrade classification performance on a downstream supervised task, while empirically, they improve the performance. We provide a novel framework to analyze this empirical result regarding negative samples using the coupon collector's problem. Our bound can implicitly incorporate the supervised loss of the downstream task in the self-supervised loss by increasing the number of negative samples. We confirm that our proposed analysis holds on real-world benchmark datasets.
翻译:歧视性自我监督的自我代表学习因其不受监督的性质以及下游任务的信息性特征说明而引起注意。实际上,它通常使用比受监督类别数量更多的负面样本。然而,现有分析中存在不一致之处;理论上,大量负面样本在下游监督任务中降低分类工作绩效,而从经验上说,它们改进了绩效。我们提供了一个新框架,用来分析使用优惠券采集器问题对负面样本的这一经验性结果。我们的界限可以通过增加负面样本数量,将受监督的下游任务损失包含在自我监督的损失中。我们确认,我们拟议的分析将保留在现实世界的基准数据集上。