All sequential decision-making agents explore so as to acquire knowledge about a particular target. It is often the responsibility of the agent designer to construct this target which, in rich and complex environments, constitutes a onerous burden; without full knowledge of the environment itself, a designer may forge a sub-optimal learning target that poorly balances the amount of information an agent must acquire to identify the target against the target's associated performance shortfall. While recent work has developed a connection between learning targets and rate-distortion theory to address this challenge and empower agents that decide what to learn in an automated fashion, the proposed algorithm does not optimally tackle the equally important challenge of efficient information acquisition. In this work, building upon the seminal design principle of information-directed sampling (Russo & Van Roy, 2014), we address this shortcoming directly to couple optimal information acquisition with the optimal design of learning targets. Along the way, we offer new insights into learning targets from the literature on rate-distortion theory before turning to empirical results that confirm the value of information when deciding what to learn.
翻译:所有顺序决策人员都为了解特定目标而进行探索,以便获得关于特定目标的知识。在丰富和复杂的环境中,设计人员往往有责任构建这个目标,这个目标构成沉重的负担;如果不充分了解环境本身,设计人员可能会形成一个亚最佳学习目标,即一个代理人员必须获得的信息数量不能与目标相关绩效不足相平衡,从而确定目标目标。虽然最近的工作在学习目标与率扭曲理论之间发展了联系,以应对这一挑战,并赋予那些决定以自动化方式学习内容的代理人员权力,但拟议的算法并没有以最佳方式应对有效信息获取这一同等重要的挑战。在这项工作中,以信息导向抽样的初级设计原则(Russo & Van Roy,2014年)为基础,我们直接解决了将最佳信息获取与最佳学习目标设计相结合的这一缺陷。此外,我们提供了从文献中学习率扭曲理论目标的新见解,然后转向验证信息在决定学习什么时的价值的经验结果。