Eluder dimension and information gain are two widely used methods of complexity measures in bandit and reinforcement learning. Eluder dimension was originally proposed as a general complexity measure of function classes, but the common examples of where it is known to be small are function spaces (vector spaces). In these cases, the primary tool to upper bound the eluder dimension is the elliptic potential lemma. Interestingly, the elliptic potential lemma also features prominently in the analysis of linear bandits/reinforcement learning and their nonparametric generalization, the information gain. We show that this is not a coincidence -- eluder dimension and information gain are equivalent in a precise sense for reproducing kernel Hilbert spaces.
翻译:极光维度和信息增益是强盗和强化学习中广泛使用的两种复杂计量方法。 极光维度最初是作为功能类的一般复杂度而提出的,但已知小的功能空间(矢量空间)是常见的例子。 在这些情况中,高压极光维度的主要工具是椭圆潜在乳腺。 有趣的是, 极地外缘潜在的乳腺在线性强盗/增强性学习及其非对称概括性分析中也占有显著地位, 信息增益。 我们表明这不是巧合 -- -- 极地维度和信息增益在精确意义上相当于再生内核希尔伯特空间。