While multitask representation learning has become a popular approach in reinforcement learning (RL) to boost the sample efficiency, the theoretical understanding of why and how it works is still limited. Most previous analytical works could only assume that the representation function is already known to the agent or from linear function class, since analyzing general function class representation encounters non-trivial technical obstacles such as generalization guarantee, formulation of confidence bound in abstract function space, etc. However, linear-case analysis heavily relies on the particularity of linear function class, while real-world practice usually adopts general non-linear representation functions like neural networks. This significantly reduces its applicability. In this work, we extend the analysis to general function class representations. Specifically, we consider an agent playing $M$ contextual bandits (or MDPs) concurrently and extracting a shared representation function $\phi$ from a specific function class $\Phi$ using our proposed Generalized Functional Upper Confidence Bound algorithm (GFUCB). We theoretically validate the benefit of multitask representation learning within general function class for bandits and linear MDP for the first time. Lastly, we conduct experiments to demonstrate the effectiveness of our algorithm with neural net representation.
翻译:虽然多任务代表制学习已成为提高抽样学习效率的流行方法(RL),但从理论上理解为何和如何发挥这种作用仍然有限,大多数先前的分析工作只能假定代表职能已经为代理人所知,或从线性职能类中知道,因为分析一般职能类代表制遇到非三边技术障碍,如一般化保证、在抽象功能空间中建立信任等。然而,线性案例分析在很大程度上依赖于线性功能类别的特殊性,而现实世界实践通常采用神经网络等一般非线性代表职能。这大大降低了其适用性。在这项工作中,我们把分析范围扩大到一般职能类代表制。具体地说,我们考虑的是,一个同时使用美元背景强盗(或MDPs)的代理制代理制代理制,并利用我们提议的通用功能类通用功能高信任度上限算法(GUUCB)从一个特定职能类中提取美元。我们从理论上验证了在一般职能类中学习多任务类代表制代表制的好处。我们第一次进行了实验,以显示我们算算算出神经网络代表法的有效性。