Adversarial contrastive learning (ACL) does not require expensive data annotations but outputs a robust representation that withstands adversarial attacks and also generalizes to a wide range of downstream tasks. However, ACL needs tremendous running time to generate the adversarial variants of all training data, which limits its scalability to large datasets. To speed up ACL, this paper proposes a robustness-aware coreset selection (RCS) method. RCS does not require label information and searches for an informative subset that minimizes a representational divergence, which is the distance of the representation between natural data and their virtual adversarial variants. The vanilla solution of RCS via traversing all possible subsets is computationally prohibitive. Therefore, we theoretically transform RCS into a surrogate problem of submodular maximization, of which the greedy search is an efficient solution with an optimality guarantee for the original problem. Empirically, our comprehensive results corroborate that RCS can speed up ACL by a large margin without significantly hurting the robustness and standard transferability. Notably, to the best of our knowledge, we are the first to conduct ACL efficiently on the large-scale ImageNet-1K dataset to obtain an effective robust representation via RCS.
翻译:反对反对比学习(ACL)并不需要昂贵的数据说明,而是产出了一种能抵御对抗性攻击的强大代表,并且可以概括到广泛的下游任务。然而,ACL需要巨大的运行时间来生成所有培训数据的对抗变体,这限制了其可扩缩到大型数据集。为了加速ACL,本文件建议了一种稳健性认知核心集选择(RCS)方法。RCS不需要为信息标签和搜索信息子集寻找信息,该子集可以最大限度地减少代表性差异,即自然数据及其虚拟对抗变体之间的距离。RCS通过翻转所有可能的子群的香草解决方案在计算上令人难以接受。因此,我们理论上将RCS转换成一个代位化的亚模式最大化问题,贪婪搜索是高效的解决方案,能为原始问题提供最佳的保证。我们的全面结果生动地证实RCS可以大幅度地加速ACL,而不会极大地损害其稳健性和标准传输能力。值得注意的是,我们最了解的是,我们是通过高效的RCK-I网络在大规模上进行有效的A-KS 。