Active Learning (AL) techniques aim to minimize the training data required to train a model for a given task. Pool-based AL techniques start with a small initial labeled pool and then iteratively pick batches of the most informative samples for labeling. Generally, the initial pool is sampled randomly and labeled to seed the AL iterations. While recent` studies have focused on evaluating the robustness of various query functions in AL, little to no attention has been given to the design of the initial labeled pool. Given the recent successes of learning representations in self-supervised/unsupervised ways, we propose to study if an intelligently sampled initial labeled pool can improve deep AL performance. We will investigate the effect of intelligently sampled initial labeled pools, including the use of self-supervised and unsupervised strategies, on deep AL methods. We describe our experimental details, implementation details, datasets, performance metrics as well as planned ablation studies in this proposal. If intelligently sampled initial pools improve AL performance, our work could make a positive contribution to boosting AL performance with no additional annotation, developing datasets with lesser annotation cost in general, and promoting further research in the use of unsupervised learning methods for AL.
翻译:主动学习 (AL) 技术旨在最大限度地减少培训特定任务模型所需的培训数据。 以 集合为基础的 AL 技术先从一个小的初始标签集合开始,然后迭接地挑选最丰富的标签样本。 一般来说, 初始集合随机抽样, 并贴上种子 AL 迭代的标签。 虽然最近的研究侧重于评估AL 中各种查询功能的稳健性, 但对于最初标签集合的设计却几乎没有给予任何注意。 鉴于最近在自我监督/无人监督方式中学习表现的成功, 我们提议研究一个智能抽样的初步标签集合是否能提高深层AL的性能。 我们将调查智能抽样的初步标签集合的效果, 包括使用自监督和非监督的战略, 对深层的AL 方法的影响。 我们描述了我们的实验细节、 实施细节、 数据集、 性能指标 以及本提案中计划的缩略图研究。 如果经过明智抽样的初始集合改进了 AL 性能, 我们的工作可以做出积极的贡献, 在不作额外注的情况下, 推进AL AL 性能 提高 性能,, 在 不 进一步 进行 校 校 校 校 中 校 中 。