通过顺序决策的镜头抽样 (Sampling Through the Lens of Sequential Decision Making)

Sampling is ubiquitous in machine learning methodologies. Due to the growth of large datasets and model complexity, we want to learn and adapt the sampling process while training a representation. Towards achieving this grand goal, a variety of sampling techniques have been proposed. However, most of them either use a fixed sampling scheme or adjust the sampling scheme based on simple heuristics. They cannot choose the best sample for model training in different stages. Inspired by "Think, Fast and Slow" (System 1 and System 2) in cognitive science, we propose a reward-guided sampling strategy called Adaptive Sample with Reward (ASR) to tackle this challenge. To the best of our knowledge, this is the first work utilizing reinforcement learning (RL) to address the sampling problem in representation learning. Our approach optimally adjusts the sampling process to achieve optimal performance. We explore geographical relationships among samples by distance-based sampling to maximize overall cumulative reward. We apply ASR to the long-standing sampling problems in similarity-based loss functions. Empirical results in information retrieval and clustering demonstrate ASR's superb performance across different datasets. We also discuss an engrossing phenomenon which we name as "ASR gravity well" in experiments.

翻译：由于大型数据集和模型复杂性的增加,我们希望在培训代表时学习和调整取样过程。为了实现这一宏大的目标,提出了各种取样技术。然而,大多数抽样技术要么采用固定的抽样计划,要么根据简单的湿度调整抽样计划,它们无法选择最佳样本,用于不同阶段的模型培训。在认知科学中,由于“思维、快速和慢速”(系统1和系统2)的启发,我们提议了一项称为“适应性样和累赘(ASR)”的奖励制导抽样战略,以迎接这一挑战。为了实现这一宏伟目标,我们最了解的是,这是利用强化性学习(RL)处理抽样问题的第一个工作。我们的方法是最佳地调整取样进程,以达到最佳性能。我们通过远程取样探索样本之间的地理关系,以最大限度地累积总的报酬。我们用ASR来研究基于类似性损失的功能中长期存在的抽样问题。我们在信息检索和组合中得出了ASR的超强性性能,我们还将它命名为不同数据严重程度的实验。