One of the main challenges in real-world reinforcement learning is to learn successfully from limited training samples. We show that in certain settings, the available data can be dramatically increased through a form of multi-task learning, by exploiting an invariance property in the tasks. We provide a theoretical performance bound for the gain in sample efficiency under this setting. This motivates a new approach to multi-task learning, which involves the design of an appropriate neural network architecture and a prioritized task-sampling strategy. We demonstrate empirically the effectiveness of the proposed approach on two real-world sequential resource allocation tasks where this invariance property occurs: financial portfolio optimization and meta federated learning.
翻译:现实世界强化学习的主要挑战之一是从有限的培训样本中成功学习。我们表明,在某些环境下,通过多种任务学习的形式,通过在任务中利用一种变差财产,可以大大增加可用数据。我们提供了一种理论性表现,以在这一背景下提高抽样效率为条件。这促使对多任务学习采取新的方法,包括设计适当的神经网络架构和优先任务抽样战略。我们从经验上证明,在两种现实世界的顺序资源分配任务中,在出现这种变差财产时,拟议办法的有效性:财务组合优化和元组合学习。