In this paper, we study representation learning for multi-task decision-making in non-stationary environments. We consider the framework of sequential linear bandits, where the agent performs a series of tasks drawn from distinct sets associated with different environments. The embeddings of tasks in each set share a low-dimensional feature extractor called representation, and representations are different across sets. We propose an online algorithm that facilitates efficient decision-making by learning and transferring non-stationary representations in an adaptive fashion. We prove that our algorithm significantly outperforms the existing ones that treat tasks independently. We also conduct experiments using both synthetic and real data to validate our theoretical insights and demonstrate the efficacy of our algorithm.
翻译:在本文中,我们研究了非静止环境中多任务决策的代表性学习。我们考虑了连续线性土匪的框架,在这个框架中,代理人执行一系列与不同环境相关的不同组合的任务。每组任务中的嵌入点都有一个被称为代表的低维特征提取器,各组的表述方式不同。我们建议了一种在线算法,通过以适应性的方式学习和转移非静止代表方式来便利有效决策。我们证明我们的算法大大优于独立处理任务的现有算法。我们还利用合成数据和真实数据进行实验,以验证我们的理论洞察,并展示我们的算法的功效。