In this paper, we study both multi-armed and contextual bandit problems in censored environments. Our goal is to estimate the performance loss due to censorship in the context of classical algorithms designed for uncensored environments. Our main contributions include the introduction of a broad class of censorship models and their analysis in terms of the effective dimension of the problem -- a natural measure of its underlying statistical complexity and main driver of the regret bound. In particular, the effective dimension allows us to maintain the structure of the original problem at first order, while embedding it in a bigger space, and thus naturally leads to results analogous to uncensored settings. Our analysis involves a continuous generalization of the Elliptical Potential Inequality, which we believe is of independent interest. We also discover an interesting property of decision-making under censorship: a transient phase during which initial misspecification of censorship is self-corrected at an extra cost, followed by a stationary phase that reflects the inherent slowdown of learning governed by the effective dimension. Our results are useful for applications of sequential decision-making models where the feedback received depends on strategic uncertainty (e.g., agents' willingness to follow a recommendation) and/or random uncertainty (e.g., loss or delay in arrival of information).
翻译:在本文中,我们研究了受审查环境中的多武装和背景土匪问题。我们的目标是根据为不受审查环境设计的古典算法来估计由于审查而导致的绩效损失。我们的主要贡献包括采用广泛的审查模式,并分析问题的有效层面 -- -- 其内在统计复杂性的自然度和造成遗憾的主要驱动因素。特别是,有效的维度使我们能够在最初的顺序上维持原始问题的结构,同时将其嵌入更大的空间,从而自然地导致类似未经审查的环境的结果。我们的分析涉及持续地普遍采用我们所认为具有独立兴趣的 Elliptical 潜在不平等。我们还发现了在审查下决策的有趣属性:在最初的错误区分审查以额外的成本自我纠正的过渡阶段,随后是反映有效维度所制约的内在学习减速的静止阶段。我们的结果有助于应用顺序决策模式,因为收到的反馈取决于战略不确定性(例如代理人对信息迟误)和随机不确定性(即信息迟误)以及(即信息迟误)的不确定性)。