One-shot coreset selection aims to select a subset of the training data, given a pruning rate, that can achieve high accuracy for models that are subsequently trained only with that subset. State-of-the-art coreset selection methods typically assign an importance score to each example and select the most important examples to form a coreset. These methods perform well at low pruning rates; but at high pruning rates, they have been found to suffer a catastrophic accuracy drop, performing worse than even random coreset selection. In this paper, we explore the reasons for this accuracy drop both theoretically and empirically. We extend previous theoretical results on the bound for model loss in terms of coverage provided by the coreset. Inspired by theoretical results, we propose a novel coverage-based metric and, based on the metric, find that coresets selected by importance-based coreset methods at high pruning rates can be expected to perform poorly compared to random coresets because of worse data coverage. We then propose a new coreset selection method, Coverage-centric Coreset Selection (CCS), where we jointly consider overall data coverage based on the proposed metric as well as importance of each example. We evaluate CCS on four datasets and show that they achieve significantly better accuracy than state-of-the-art coreset selection methods as well as random sampling under high pruning rates, and comparable performance at low pruning rates. For example, CCS achieves 7.04% better accuracy than random sampling and at least 20.16% better than popular importance-based selection methods on CIFAR10 with a 90% pruning rate.
翻译:单点核心集选择旨在选择一组培训数据, 以一个点点点速率为条件, 从而在理论上和经验上都能够实现这一精度下降的原因。 我们扩展了先前的理论结果, 以核心集提供的覆盖为条件, 受理论结果的启发, 我们提出了一个新的基于覆盖的衡量标准, 并以基准为基础, 发现根据基于重要性的核心集方法选择的精度在低点点速率中表现良好; 但是, 在高点点点点运行率中, 发现它们受到灾难性的精确度下降, 其效果甚至比随机核心集选择差。 在本文中, 我们探索了这种精确度在理论上和经验上都下降的原因。 我们扩展了以前关于模型损失的理论结果, 以核心集为对象的覆盖范围为核心集, 我们根据理论结果, 我们提出了一个新的基于拟议的衡量基准的整体数据覆盖面, 以覆盖的精确度为基础, 以每个中心点的精确度为基准点, 以精确度为基准点, 以每点的精确度为基准点, 以精确度为基准点, 以精确度为基准点, 以每点, 以精确度为基准点, 以精确度为基准点, 以精确度为基准点为基准点, 以 以 以 以 以 以 以 以 以 以 以 以 标点为基准点为基准点为基准点为基准点,, 以 以 以 以 以 以 以 以 以 标点为基准点为基准点为基准点 以 以 以 以 以 以 以 以 以 以 标度 以 以 以 以 以 以 以 以 以 以 以 以 以 以 以 以 以 以 以 以 以 以 标 标 点 点 点 以 以 以 以 以 以 点 点 点 点 标 标 点 点 点 点 以 以 以 以 以 以 以 以 以 以 以 以 以 以 以 以 以 以 以 以 点 点 标 标 点 点