RETREVE: 高效和强力半监督学习的核心选择 (RETRIEVE: Coreset Selection for Efficient and Robust Semi-Supervised Learning)

Semi-supervised learning (SSL) algorithms have had great success in recent years in limited labeled data regimes. However, the current state-of-the-art SSL algorithms are computationally expensive and entail significant compute time and energy requirements. This can prove to be a huge limitation for many smaller companies and academic groups. Our main insight is that training on a subset of unlabeled data instead of entire unlabeled data enables the current SSL algorithms to converge faster, thereby reducing the computational costs significantly. In this work, we propose RETRIEVE, a coreset selection framework for efficient and robust semi-supervised learning. RETRIEVE selects the coreset by solving a mixed discrete-continuous bi-level optimization problem such that the selected coreset minimizes the labeled set loss. We use a one-step gradient approximation and show that the discrete optimization problem is approximately submodular, thereby enabling simple greedy algorithms to obtain the coreset. We empirically demonstrate on several real-world datasets that existing SSL algorithms like VAT, Mean-Teacher, FixMatch, when used with RETRIEVE, achieve a) faster training times, b) better performance when unlabeled data consists of Out-of-Distribution(OOD) data and imbalance. More specifically, we show that with minimal accuracy degradation, RETRIEVE achieves a speedup of around 3X in the traditional SSL setting and achieves a speedup of 5X compared to state-of-the-art (SOTA) robust SSL algorithms in the case of imbalance and OOD data.

翻译：近些年来,半监督的学习(SSL)算法在有限的标签数据系统中取得了巨大的成功。但是,当前的最先进的 SSL 算法计算成本昂贵,并且需要大量计算时间和能量。这对许多较小的公司和学术团体来说可能是一个巨大的限制。我们的主要见解是,关于一个未标签数据子集的培训,而不是整个未标签数据,使得目前的 SSL 算法能够更快地趋同,从而大大降低计算成本。在这项工作中,我们提议 RETRIEVE, 一种高效和稳健的半监督性速度学习的核心选择框架。 RETRIEVE 选择核心集, 解决一个混合的连续双级优化问题, 从而让选中的两级核心组将标签损失降到最小。我们使用一个一步的梯度近近, 显示离散的优化问题使简单的贪婪算法能够获取核心数据集。我们用几个真实世界的数据集集, 现有 SSLSL 将VAT、 MIA、 SixMatchMatch, 当使用的S- disal 在S- disal deal ladeal 4 上, 当我们用STRA 实现一个更精确的数据的精确的数据显示时, 5 时, 更快的性能在STRA 。