Classification is an essential and fundamental task in machine learning, playing a cardinal role in the field of natural language processing (NLP) and computer vision (CV). In a supervised learning setting, labels are always needed for the classification task. Especially for deep neural models, a large amount of high-quality labeled data are required for training. However, when a new domain comes out, it is usually hard or expensive to acquire the labels. Transfer learning could be an option to transfer the knowledge from a source domain to a target domain. A challenge is that these two domains can be different, either on the feature distribution, or the class distribution for the nature of the samples. In this work, we evaluate some existing transfer learning approaches on detecting the bias of imbalanced classes including traditional and deep models. Besides, we propose an approach to bridge the gap of the domain class imbalance issue.
翻译:在自然语言处理(NLP)和计算机视觉(CV)领域,分类是一项重要和基本的任务,在自然语言处理(NLP)和计算机视觉(CV)领域发挥着主要作用。在受监督的学习环境中,分类任务总是需要标签。特别是对于深层神经模型来说,培训需要大量高质量的标签数据。然而,在新领域出现时,获取标签通常很难或费用昂贵。转让学习可能是将知识从源域转移到目标域的一个选择。转让学习可能是将知识从源域转移到目标域的一个选择。一个挑战是,这两个领域在特征分布或样本性质的班级分布方面可能有所不同。在这项工作中,我们评估了在发现包括传统和深层模型在内的不平衡班级的偏向性方面现有的一些转让学习方法。此外,我们建议了缩小域类不平衡问题差距的方法。