Binary classification (BC) is a practical task that is ubiquitous in real-world problems, such as distinguishing healthy and unhealthy objects in biomedical diagnostics and defective and non-defective products in manufacturing inspections. Nonetheless, fully annotated data are commonly required to effectively solve this problem, and their collection by domain experts is a tedious and expensive procedure. In contrast to BC, several significant semi-supervised learning techniques that heavily rely on stochastic data augmentation techniques have been devised for solving multi-class classification. In this study, we demonstrate that the stochastic data augmentation technique is less suitable for solving typical BC problems because it can omit crucial features that strictly distinguish between positive and negative samples. To address this issue, we propose a new learning representation to solve the BC problem using a few labels with a random k-pair cross-distance learning mechanism. First, by harnessing a few labeled samples, the encoder network learns the projection of positive and negative samples in angular spaces to maximize and minimize their inter-class and intra-class distances, respectively. Second, the classifier learns to discriminate between positive and negative samples using on-the-fly labels generated based on the angular space and labeled samples to solve BC tasks. Extensive experiments were conducted using four real-world publicly available BC datasets. With few labels and without any data augmentation techniques, the proposed method outperformed state-of-the-art semi-supervised and self-supervised learning methods. Moreover, with 10% labeling, our semi-supervised classifier could obtain competitive accuracy compared with a fully supervised setting.
翻译:二进制分类(BC)是一项实际任务,在现实世界问题中普遍存在,例如生物医学诊断中区分健康和不健康对象以及制造检查中区分有缺陷和非缺陷产品等健康与不健康对象,尽管如此,通常需要有充分附加说明的数据来有效解决这一问题,而由域专家收集这些数据是一种乏味和昂贵的程序。与BC不同的是,已经设计出几种重要的半监督的半监督学习技术,这些技术严重依赖随机数据增强技术来解决多级分类。在这项研究中,我们证明,对典型的二进制问题来说,透析数据增强技术不适宜于解决典型的二进制问题,因为它可以省略严格区分正式和负式样本的关键特征。为解决这一问题,我们建议采用使用随机 k-pair 跨距离学习机制的少数标签来解决三进制问题。首先,通过使用少数贴标签的样本,孵化器网络可以了解在角化空间空间中预测正式和负式样本,以尽可能最大化和最小的跨级和内部距离。第二,叙级者学习了对正式的自我对比,使用正式样本,使用正式和反式模型,在公开进行。