We propose and analyze a kernelized version of Q-learning. Although a kernel space is typically infinite-dimensional, extensive study has shown that generalization is only affected by the effective dimension of the data. We incorporate such ideas into the Q-learning framework and derive regret bounds for arbitrary kernels. In particular, we provide concrete bounds for linear kernels and Gaussian RBF kernels; notably, the latter bound looks almost identical to the former, only that the actual dimension is replaced by a different notion of dimensionality. Finally, we test our algorithm on a suite of classic control tasks; remarkably, under the Gaussian RBF kernel, it achieves reasonably good performance after only 1000 environmental steps, while its neural network counterpart, deep Q-learning, still struggles.
翻译:我们提出并分析一个内核化的Q-学习版本。 虽然内核空间一般是无限的, 但广泛的研究表明, 普遍性只受数据有效维度的影响。 我们将这些想法纳入Q- 学习框架, 并得出任意内核的遗憾界限。 特别是, 我们为线性内核和高森RBF内核提供具体界限; 特别是, 后者看起来几乎与前者相同, 只有实际的维度被不同的维度概念所取代。 最后, 我们用一套经典的控制任务来测试我们的算法; 值得注意的是, 在高山RBF内核下, 它在仅仅1000个环境步骤之后取得了相当良好的表现, 而它的神经网络, 深Q- 学习, 仍然在挣扎中。