Partial monitoring is an expressive framework for sequential decision-making with an abundance of applications, including graph-structured and dueling bandits, dynamic pricing and transductive feedback models. We survey and extend recent results on the linear formulation of partial monitoring that naturally generalizes the standard linear bandit setting. The main result is that a single algorithm, information-directed sampling (IDS), is (nearly) worst-case rate optimal in all finite-action games. We present a simple and unified analysis of stochastic partial monitoring, and further extend the model to the contextual and kernelized setting.
翻译:部分监测是连续决策的一个明确框架,其应用范围很广,包括图表结构化和决断式强盗、动态定价和传输反馈模型。我们调查并推广关于部分监测线性表述的最新结果,这种监测自然地概括了标准的线性强盗设置。主要结果是,单一算法、信息导向抽样(IDS)在所有有限行动游戏中(几乎)最差的速率是最佳的。我们对随机性部分监测进行简单和统一的分析,并将模型进一步扩展至背景和内嵌式环境。