We consider $Q$-learning with knowledge transfer, using samples from a target reinforcement learning (RL) task as well as source samples from different but related RL tasks. We propose transfer learning algorithms for both batch and online $Q$-learning with offline source studies. The proposed transferred $Q$-learning algorithm contains a novel re-targeting step that enables vertical information-cascading along multiple steps in an RL task, besides the usual horizontal information-gathering as transfer learning (TL) for supervised learning. We establish the first theoretical justifications of TL in RL tasks by showing a faster rate of convergence of the $Q$ function estimation in the offline RL transfer, and a lower regret bound in the offline-to-online RL transfer under certain similarity assumptions. Empirical evidences from both synthetic and real datasets are presented to back up the proposed algorithm and our theoretical results.
翻译:我们考虑用知识转让来学习$Q,使用目标强化学习(RL)任务样本以及不同但相关的RL任务源样本来学习知识。我们建议用离线源研究来转移批量和在线学习费用(Q)的学习算法。拟议转移的Q美元学习算法包含一个新的重新定位步骤,除了通常的横向信息收集作为受监督学习的转移学习(TL)外,还允许在RL任务中沿多个步骤垂直收集信息。我们通过显示离线转机转机中美元功能估算的更快趋同率和某些类似假设下离线转机转机至在线转机的遗憾程度降低,从而确定了RL任务中TL的第一个理论理由。合成和真实数据集的实证证据将支持拟议的算法和我们的理论结果。