We consider a set of APs with unknown data rates that cooperatively serve a mobile client. The data rate of each link is i.i.d. sampled from a distribution that is unknown a priori. In contrast to traditional link scheduling problems under uncertainty, we assume that in each time step, the device can probe a subset of links before deciding which one to use. We model this problem as a contextual bandit problem with probing (CBwP) and present an efficient algorithm. We further establish the regret of our algorithm for links with Bernoulli data rates. Our CBwP model is a novel extension of the classic contextual bandit model and can potentially be applied to a large class of sequential decision-making problems that involve joint probing and play under uncertainty.
翻译:我们考虑的是一组数据率不明的AP, 这些数据率是合作为移动客户服务的。 每个链接的数据率是先验的分布样本。 与不确定的传统的链接时间安排问题相反, 我们假设在每一步, 设备可以在决定使用之前先探测一组链接。 我们把这个问题作为调查( CBwP) 的背景强盗问题来模型, 并展示一个高效的算法。 我们进一步确认了我们对与Bernoulli数据率连接的算法的遗憾。 我们的 CBwP模型是经典背景强盗模型的新扩展, 并有可能适用于涉及在不确定情况下联合探测和玩耍的一大批顺序决策问题 。