剩余超合残余勘探方法 (Residual Overfit Method of Exploration)

Exploration is a crucial aspect of bandit and reinforcement learning algorithms. The uncertainty quantification necessary for exploration often comes from either closed-form expressions based on simple models or resampling and posterior approximations that are computationally intensive. We propose instead an approximate exploration methodology based on fitting only two point estimates, one tuned and one overfit. The approach, which we term the residual overfit method of exploration (ROME), drives exploration towards actions where the overfit model exhibits the most overfitting compared to the tuned model. The intuition is that overfitting occurs the most at actions and contexts with insufficient data to form accurate predictions of the reward. We justify this intuition formally from both a frequentist and a Bayesian information theoretic perspective. The result is a method that generalizes to a wide variety of models and avoids the computational overhead of resampling or posterior approximations. We compare ROME against a set of established contextual bandit methods on three datasets and find it to be one of the best performing.

翻译：勘探所需的不确定性量化往往来自基于简单模型的封闭式表达式,或者基于再抽样和后近似,这些表达式是计算密集的。我们提议了一种大约的勘探方法,其基础是只安装两个点的估计数,一个是调整的,一个是超配的。我们称之为剩余超配勘探方法(ROME),该方法将勘探推向行动,而过度装配模型显示的比调整模型更适合的行动。直觉是,在行动和背景中,过度装配的情况最多,数据不足,无法准确预测奖赏。我们从经常使用和巴伊西亚信息理论角度正式证明这种直觉是合理的。其结果是一种方法,它概括了多种模型,避免了再采样或后近似的计算间接。我们把ROME与三套数据集的一套既定背景频谱方法进行比较,发现它是最佳方法之一。

相关内容

过拟合

关注 8

过拟合，在AI领域多指机器学习得到模型太过复杂，导致在训练集上表现很好，然而在测试集上却不尽人意。过拟合（over-fitting）也称为过学习，它的直观表现是算法在训练集上表现好，但在测试集上表现不好，泛化性能差。过拟合是在模型参数拟合过程中由于训练数据包含抽样误差，在训练时复杂的模型将抽样误差也进行了拟合导致的。

【KDD2021】图神经网络，NUS- Xavier Bresson教授

专知会员服务

66+阅读 · 2021年8月20日

不可错过！UIUC最新《统计强化学习》课程！

专知会员服务

53+阅读 · 2020年9月7日

最新《生成式对抗网络》简介，25页ppt

专知会员服务

175+阅读 · 2020年6月28日

【牛津大学】深度残差强化学习，Deep Residual Reinforcement Learning

专知会员服务

84+阅读 · 2020年2月18日