Active learning is a practical field of machine learning that automates the process of selecting which data to label. Current methods are effective in reducing the burden of data labeling but are heavily model-reliant. This has led to the inability of sampled data to be transferred to new models as well as issues with sampling bias. Both issues are of crucial concern in machine learning deployment. We propose active learning methods utilizing combinatorial coverage to overcome these issues. The proposed methods are data-centric, as opposed to model-centric, and through our experiments we show that the inclusion of coverage in active learning leads to sampling data that tends to be the best in transferring to better performing models and has a competitive sampling bias compared to benchmark methods.
翻译:积极学习是机器学习的一个实用领域,它使选择哪些数据标记的过程自动化。目前的方法在减少数据标签负担方面是有效的,但非常依赖模型。这导致抽样数据无法转移到新的模型以及抽样偏差问题。这两个问题在机器学习的部署中都非常令人关注。我们建议采用积极的学习方法,利用组合覆盖来克服这些问题。建议的方法是数据中心,而不是模型中心,通过我们的实验,我们发现,在积极学习中包括覆盖面导致抽样数据,而这种抽样数据往往是向业绩较好的模型转移的最佳方法,并且与基准方法相比,具有竞争性抽样偏向。</s>