Labeled data can be expensive to acquire in several application domains, including medical imaging, robotics, and computer vision. To efficiently train machine learning models under such high labeling costs, active learning (AL) judiciously selects the most informative data instances to label on-the-fly. This active sampling process can benefit from a statistical function model, that is typically captured by a Gaussian process (GP). While most GP-based AL approaches rely on a single kernel function, the present contribution advocates an ensemble of GP models with weights adapted to the labeled data collected incrementally. Building on this novel EGP model, a suite of acquisition functions emerges based on the uncertainty and disagreement rules. An adaptively weighted ensemble of EGP-based acquisition functions is also introduced to further robustify performance. Extensive tests on synthetic and real datasets showcase the merits of the proposed EGP-based approaches with respect to the single GP-based AL alternatives.
翻译:在若干应用领域,包括医学成像、机器人和计算机视觉领域,标签数据可能非常昂贵。为了在如此高的标签成本下有效地培训机器学习模型,积极学习(AL)明智地选择了最丰富的信息数据实例贴在现场标签。这种积极的抽样过程可以受益于统计功能模型,通常由Gaussian进程(GP)捕获。虽然大多数基于GP的AL方法依赖于单一内核功能,但目前的贡献主张有一整套GP模型,其权重与所收集的标签数据相适应。在这一新型EGP模型的基础上,根据不确定性和分歧规则产生了一套获取功能。还引入了基于EGP获取功能的适应性加权集,以进一步巩固性能。对合成和真实数据集的广泛测试展示了拟议的EGP方法在单一基于GP的替代方法方面的优点。