In this work we examine the classification accuracy and robustness of a state-of-the-art semi-supervised learning (SSL) algorithm applied to the morphological classification of radio galaxies. We test if SSL with fewer labels can achieve test accuracies comparable to the supervised state-of-the-art and whether this holds when incorporating previously unseen data. We find that for the radio galaxy classification problem considered, SSL provides additional regularisation and outperforms the baseline test accuracy. However, in contrast to model performance metrics reported on computer science benchmarking data-sets, we find that improvement is limited to a narrow range of label volumes, with performance falling off rapidly at low label volumes. Additionally, we show that SSL does not improve model calibration, regardless of whether classification is improved. Moreover, we find that when different underlying catalogues drawn from the same radio survey are used to provide the labelled and unlabelled data-sets required for SSL, a significant drop in classification performance is observered, highlighting the difficulty of applying SSL techniques under dataset shift. We show that a class-imbalanced unlabelled data pool negatively affects performance through prior probability shift, which we suggest may explain this performance drop, and that using the Frechet Distance between labelled and unlabelled data-sets as a measure of data-set shift can provide a prediction of model performance, but that for typical radio galaxy data-sets with labelled sample volumes of O(1000), the sample variance associated with this technique is high and the technique is in general not sufficiently robust to replace a train-test cycle.
翻译:在这项工作中,我们检查了适用于无线电星系形态分类的最先进半监督的半监督学习算法(SSL)的分类准确性和稳健性;我们测试了使用较少标签的SSL能否达到与受监督的状态相近的测试精度,以及是否在纳入先前未见的数据时保持这一特性;我们发现,对于所考虑的无线电星系分类问题,SSL提供了额外的常规化并超过了基线测试精度;然而,与计算机科学基准数据集上报告的模型性能衡量标准相比,我们发现改进仅限于范围狭窄的标签数量,其性能迅速下降至低标签数量。此外,我们显示,无论分类是否改进,SLSL能够取得与受监督的最新校准值相匹配的测试标准,在纳入先前的状态数据分类时,如果使用同一无线电星系分类的不同基本目录提供标签和无标签的数据集,那么在数据集变换下的分类中很难应用标准。我们发现,类级平衡的常规性能与标准值值值值下降,而在数据变异度数据周期中,我们发现,使用这种不贴标签的精确值数据流数据流的精确性数据流会充分解释前的性数据变。