We consider a rate-constrained contextual multi-armed bandit (RC-CMAB) problem, in which a group of agents are solving the same contextual multi-armed bandit (CMAB) problem. However, the contexts are observed by a remotely connected entity, i.e., the decision-maker, that updates the policy to maximize the returned rewards, and communicates the arms to be sampled by the agents to a controller over a rate-limited communications channel. This framework can be applied to personalized ad placement, whenever the content owner observes the website visitors, and hence has the context, but needs to transmit the ads to be shown to a controller that is in charge of placing the marketing content. Consequently, the rate-constrained CMAB (RC-CMAB) problem requires the study of lossy compression schemes for the policy to be employed whenever the constraint on the channel rate does not allow the uncompressed transmission of the decision-maker's intentions. We characterize the fundamental information theoretic limits of this problem by letting the number of agents go to infinity, and study the regret that can be achieved, identifying the two distinct rate regions leading to linear and sub-linear regrets respectively. We then analyze the optimal compression scheme achievable in the limit with infinite agents, when using the forward and reverse KL divergence as distortion metric. Based on this, we also propose a practical coding scheme, and provide numerical results.
翻译:我们考虑的是受速率限制的背景多武装土匪(RC-CMAB)问题,在这个问题上,一组代理商正在解决同样的背景多武装土匪问题,然而,一个远程连通的实体,即决策者观察到了背景情况,他们更新了政策,以尽量获得回报,并将武器通过代理商抽样交给一个受费率限制的通信频道控制员。这个框架可以适用于个性化广告的放置,只要内容所有者观察网站访客,因此有其背景,但需要向负责放置营销内容的主计长展示广告。因此,受费率限制的CMAB(RC-CMAB)问题要求研究损失压缩办法,以便当对频道费率的限制不允许不受抑制地传输决策者的意图时,该政策可以使用。 我们把这一问题的基本信息描述为该基本信息的范围,让代理商的数量达到无限,并研究能够实现的遗憾,确定两个不同的比例,即受费率限制的CMAB(RC-CAB)问题需要研究该政策的损失压缩办法,一旦对频道收费率的限制,我们同时提出可实现的平流和分级标准,我们再提出一个可实现的、可实现的卡路路面和分级的汇率。