We study neural-linear bandits for solving problems where {\em both} exploration and representation learning play an important role. Neural-linear bandits harnesses the representation power of Deep Neural Networks (DNNs) and combines it with efficient exploration mechanisms by leveraging uncertainty estimation of the model, designed for linear contextual bandits on top of the last hidden layer. In order to mitigate the problem of representation change during the process, new uncertainty estimations are computed using stored data from an unlimited buffer. Nevertheless, when the amount of stored data is limited, a phenomenon called catastrophic forgetting emerges. To alleviate this, we propose a likelihood matching algorithm that is resilient to catastrophic forgetting and is completely online. We applied our algorithm, Limited Memory Neural-Linear with Likelihood Matching (NeuralLinear-LiM2) on a variety of datasets and observed that our algorithm achieves comparable performance to the unlimited memory approach while exhibits resilience to catastrophic forgetting.
翻译:我们研究神经-线性土匪,以在探索和代表性学习发挥重要作用的地方解决问题。神经-线性土匪利用深神经网络(DNNs)的代表性力量,并通过利用模型的不确定性估计,将其与高效的勘探机制相结合,该模型是为最后隐藏层的线性背景土匪设计的。为了缓解在过程中代表性变化的问题,利用一个无限缓冲的存储数据来计算新的不确定性估计。然而,在储存的数据数量有限的情况下,出现了一种被称为灾难性遗忘的现象。为了减轻这种现象,我们提出了一种适应灾难性遗忘的可能性匹配算法,这种算法是完全在线的。我们在各种数据集上应用了我们的算法,即“有限记忆-线性-线性匹配”(NeuralLinear-LiM2),并观察到我们的算法取得了与无限记忆方法相似的业绩,同时展示了灾难性遗忘的恢复力。