Class-imbalanced datasets are known to cause the problem of model being biased towards the majority classes. In this project, we set up two research questions: 1) when is the class-imbalance problem more prevalent in self-supervised pre-training? and 2) can offline clustering of feature representations help pre-training on class-imbalanced data? Our experiments investigate the former question by adjusting the degree of {\it class-imbalance} when training the baseline models, namely SimCLR and SimSiam on CIFAR-10 database. To answer the latter question, we train each expert model on each subset of the feature clusters. We then distill the knowledge of expert models into a single model, so that we will be able to compare the performance of this model to our baselines.
翻译:据知,分类平衡的数据集会造成模型偏向多数类的问题。在这个项目中,我们设置了两个研究问题:(1) 在自我监督的训练前,等级不平衡问题在什么时候更加普遍?和(2) 功能代表的离线组合有助于对分类平衡的数据进行预先培训?我们的实验通过在CIFAR-10数据库中培训基准模型,即SimCLR和SimSiam时调整 级平衡的程度来调查前一个问题。为了回答后一个问题,我们为每个专题组的每一个子集培训了每个专家模型。然后,我们将专家模型的知识提炼成一个单一模型,以便我们能够将该模型的性能与我们的基线进行比较。