In this effort we consider a reinforcement learning (RL) technique for solving personalization tasks with complex reward signals. In particular, our approach is based on state space clustering with the use of a simplistic $k$-means algorithm as well as conventional choices of the network architectures and optimization algorithms. Numerical examples demonstrate the efficiency of different RL procedures and are used to illustrate that this technique accelerates the agent's ability to learn and does not restrict the agent's performance.
翻译:在这一努力中,我们考虑一种强化学习(RL)技术,用复杂的奖赏信号解决个性化任务,特别是我们的方法以国家空间集群为基础,使用简单化的美元汇率算法以及传统的网络架构选择和优化算法。 数字实例表明不同RL程序的效率,并用来说明这种技术加快了代理人的学习能力,而不是限制代理人的性能。