A challenge in multi-agent reinforcement learning is to be able to generalize over intractable state-action spaces. Inspired from Tesseract [Mahajan et al., 2021], this position paper investigates generalisation in state-action space over unexplored state-action pairs by modelling the transition and reward functions as tensors of low CP-rank. Initial experiments on synthetic MDPs show that using tensor decompositions in a model-based reinforcement learning algorithm can lead to much faster convergence if the true transition and reward functions are indeed of low rank.
翻译:多试剂强化学习中的一项挑战是能够对棘手的州行动空间进行普及。 受宇宙魔方[Mahajan 等人,2021]的启发,本立场文件调查了州行动空间对未探索的州行动对对方的概括性,以低CP级阵列模拟过渡和奖励功能。 合成MDP的初始实验显示,如果真正的过渡和奖励功能确实处于低级别,那么在基于模型的强化学习算法中使用高压分解可导致更快的趋同。