The information ratio offers an approach to assessing the efficacy with which an agent balances between exploration and exploitation. Originally, this was defined to be the ratio between squared expected regret and the mutual information between the environment and action-observation pair, which represents a measure of information gain. Recent work has inspired consideration of alternative information measures, particularly for use in analysis of bandit learning algorithms to arrive at tighter regret bounds. We investigate whether quantification of information via such alternatives can improve the realized performance of information-directed sampling, which aims to minimize the information ratio.
翻译:信息比率为评估代理人在勘探和开发之间平衡的效力提供了一种方法,最初,它被定义为平方预期遗憾与环境与行动观察对等之间相互信息之间的比率,这是一种信息收益的衡量尺度,最近的工作激发了对替代信息措施的考虑,特别是用于分析土匪学习算法以达到更严格的遗憾界限。我们调查通过这种替代方法量化信息是否能提高信息导向抽样的实际性能,从而最大限度地减少信息比率。