Addressing the class imbalance in long-tailed semi-supervised learning (SSL) poses a few significant challenges stemming from differences between the marginal distributions of unlabeled data and the labeled data, as the former is often unknown and potentially distinct from the latter. The first challenge is to avoid biasing the pseudo-labels towards an incorrect distribution, such as that of the labeled data or a balanced distribution, during training. However, we still wish to ensure a balanced unlabeled distribution during inference, which is the second challenge. To address both of these challenges, we propose a three-faceted solution: a flexible distribution alignment that progressively aligns the classifier from a dynamically estimated unlabeled prior towards a balanced distribution, a soft consistency regularization that exploits underconfident pseudo-labels discarded by threshold-based methods, and a schema for expanding the unlabeled set with input data from the labeled partition. This last facet comes in as a response to the commonly-overlooked fact that disjoint partitions of labeled and unlabeled data prevent the benefits of strong data augmentation on the labeled set. Our overall framework requires no additional training cycles, so it will align, distill, and augment everything all at once (ADALLO). Our extensive evaluations of ADALLO on imbalanced SSL benchmark datasets, including CIFAR10-LT, CIFAR100-LT, and STL10-LT with varying degrees of class imbalance, amount of labeled data, and distribution mismatch, demonstrate significant improvements in the performance of imbalanced SSL under large distribution mismatch, as well as competitiveness with state-of-the-art methods when the labeled and unlabeled data follow the same marginal distribution. Our code will be released upon paper acceptance.
翻译:暂无翻译